synapse.ml.train package
Submodules
synapse.ml.train.ComputeModelStatistics module
- class synapse.ml.train.ComputeModelStatistics.ComputeModelStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')
- getScoredLabelsCol()[source]
- Returns
Scored labels column name, only required if using SparkML estimators
- Return type
scoredLabelsCol
- getScoresCol()[source]
- Returns
Scores or raw prediction column name, only required if using SparkML estimators
- Return type
scoresCol
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')
- scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')
- setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]
Set the (keyword only) parameters
synapse.ml.train.ComputePerInstanceStatistics module
- class synapse.ml.train.ComputePerInstanceStatistics.ComputePerInstanceStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
scoredLabelsCol¶ (str) – Scored labels column name, only required if using SparkML estimators
scoredProbabilitiesCol¶ (str) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators
scoresCol¶ (str) – Scores or raw prediction column name, only required if using SparkML estimators
- evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')
- getScoredLabelsCol()[source]
- Returns
Scored labels column name, only required if using SparkML estimators
- Return type
scoredLabelsCol
- getScoredProbabilitiesCol()[source]
- Returns
Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators
- Return type
scoredProbabilitiesCol
- getScoresCol()[source]
- Returns
Scores or raw prediction column name, only required if using SparkML estimators
- Return type
scoresCol
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')
- scoredProbabilitiesCol = Param(parent='undefined', name='scoredProbabilitiesCol', doc='Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators')
- scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')
- setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]
Set the (keyword only) parameters
- setScoredLabelsCol(value)[source]
- Parameters
scoredLabelsCol¶ – Scored labels column name, only required if using SparkML estimators
synapse.ml.train.TrainClassifier module
- class synapse.ml.train.TrainClassifier.TrainClassifier(java_obj=None, featuresCol='TrainClassifier_419c9d27becf_features', inputCols=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
- inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- labels = Param(parent='undefined', name='labels', doc='Sorted label values on the labels column')
- model = Param(parent='undefined', name='model', doc='Classifier to run')
- numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')
- reindexLabel = Param(parent='undefined', name='reindexLabel', doc='Re-index the label column')
synapse.ml.train.TrainRegressor module
- class synapse.ml.train.TrainRegressor.TrainRegressor(java_obj=None, featuresCol='TrainRegressor_b2d566bd148c_features', inputCols=None, labelCol=None, model=None, numFeatures=0)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
- inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- model = Param(parent='undefined', name='model', doc='Regressor to run')
- numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')
synapse.ml.train.TrainedClassifierModel module
- class synapse.ml.train.TrainedClassifierModel.TrainedClassifierModel(java_obj=None, featuresCol=None, labelCol=None, levels=None, model=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- levels = Param(parent='undefined', name='levels', doc='the levels')
- model = Param(parent='undefined', name='model', doc='model produced by training')
synapse.ml.train.TrainedRegressorModel module
- class synapse.ml.train.TrainedRegressorModel.TrainedRegressorModel(java_obj=None, featuresCol=None, labelCol=None, model=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- model = Param(parent='undefined', name='model', doc='model produced by training')
Module contents
SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.