synapse.ml.train package

Submodules

synapse.ml.train.ComputeModelStatistics module

class synapse.ml.train.ComputeModelStatistics.ComputeModelStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • evaluationMetric (str) – Metric to evaluate models with

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')
getEvaluationMetric()[source]
Returns

Metric to evaluate models with

Return type

evaluationMetric

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getScoredLabelsCol()[source]
Returns

Scored labels column name, only required if using SparkML estimators

Return type

scoredLabelsCol

getScoresCol()[source]
Returns

Scores or raw prediction column name, only required if using SparkML estimators

Return type

scoresCol

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
classmethod read()[source]

Returns an MLReader instance for this class.

scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')
scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')
setEvaluationMetric(value)[source]
Parameters

evaluationMetric – Metric to evaluate models with

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]

Set the (keyword only) parameters

setScoredLabelsCol(value)[source]
Parameters

scoredLabelsCol – Scored labels column name, only required if using SparkML estimators

setScoresCol(value)[source]
Parameters

scoresCol – Scores or raw prediction column name, only required if using SparkML estimators

synapse.ml.train.ComputePerInstanceStatistics module

class synapse.ml.train.ComputePerInstanceStatistics.ComputePerInstanceStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • evaluationMetric (str) – Metric to evaluate models with

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoredProbabilitiesCol (str) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')
getEvaluationMetric()[source]
Returns

Metric to evaluate models with

Return type

evaluationMetric

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getScoredLabelsCol()[source]
Returns

Scored labels column name, only required if using SparkML estimators

Return type

scoredLabelsCol

getScoredProbabilitiesCol()[source]
Returns

Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

Return type

scoredProbabilitiesCol

getScoresCol()[source]
Returns

Scores or raw prediction column name, only required if using SparkML estimators

Return type

scoresCol

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
classmethod read()[source]

Returns an MLReader instance for this class.

scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')
scoredProbabilitiesCol = Param(parent='undefined', name='scoredProbabilitiesCol', doc='Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators')
scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')
setEvaluationMetric(value)[source]
Parameters

evaluationMetric – Metric to evaluate models with

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]

Set the (keyword only) parameters

setScoredLabelsCol(value)[source]
Parameters

scoredLabelsCol – Scored labels column name, only required if using SparkML estimators

setScoredProbabilitiesCol(value)[source]
Parameters

scoredProbabilitiesCol – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

setScoresCol(value)[source]
Parameters

scoresCol – Scores or raw prediction column name, only required if using SparkML estimators

synapse.ml.train.TrainClassifier module

class synapse.ml.train.TrainClassifier.TrainClassifier(java_obj=None, featuresCol='TrainClassifier_a3a26ec4ddb3_features', inputCols=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • featuresCol (str) – The name of the features column

  • inputCols (list) – The names of the input columns

  • labelCol (str) – The name of the label column

  • labels (list) – Sorted label values on the labels column

  • model (object) – Classifier to run

  • numFeatures (int) – Number of features to hash to

  • reindexLabel (bool) – Re-index the label column

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns

The name of the features column

Return type

featuresCol

getInputCols()[source]
Returns

The names of the input columns

Return type

inputCols

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getLabels()[source]
Returns

Sorted label values on the labels column

Return type

labels

getModel()[source]
Returns

Classifier to run

Return type

model

getNumFeatures()[source]
Returns

Number of features to hash to

Return type

numFeatures

getReindexLabel()[source]
Returns

Re-index the label column

Return type

reindexLabel

inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
labels = Param(parent='undefined', name='labels', doc='Sorted label values on the labels column')
model = Param(parent='undefined', name='model', doc='Classifier to run')
numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')
classmethod read()[source]

Returns an MLReader instance for this class.

reindexLabel = Param(parent='undefined', name='reindexLabel', doc='Re-index the label column')
setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setInputCols(value)[source]
Parameters

inputCols – The names of the input columns

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setLabels(value)[source]
Parameters

labels – Sorted label values on the labels column

setModel(value)[source]
Parameters

model – Classifier to run

setNumFeatures(value)[source]
Parameters

numFeatures – Number of features to hash to

setParams(featuresCol='TrainClassifier_a3a26ec4ddb3_features', inputCols=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]

Set the (keyword only) parameters

setReindexLabel(value)[source]
Parameters

reindexLabel – Re-index the label column

synapse.ml.train.TrainRegressor module

class synapse.ml.train.TrainRegressor.TrainRegressor(java_obj=None, featuresCol='TrainRegressor_44640d636827_features', inputCols=None, labelCol=None, model=None, numFeatures=0)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • featuresCol (str) – The name of the features column

  • inputCols (list) – The names of the input columns

  • labelCol (str) – The name of the label column

  • model (object) – Regressor to run

  • numFeatures (int) – Number of features to hash to

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns

The name of the features column

Return type

featuresCol

getInputCols()[source]
Returns

The names of the input columns

Return type

inputCols

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getModel()[source]
Returns

Regressor to run

Return type

model

getNumFeatures()[source]
Returns

Number of features to hash to

Return type

numFeatures

inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
model = Param(parent='undefined', name='model', doc='Regressor to run')
numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')
classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setInputCols(value)[source]
Parameters

inputCols – The names of the input columns

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setModel(value)[source]
Parameters

model – Regressor to run

setNumFeatures(value)[source]
Parameters

numFeatures – Number of features to hash to

setParams(featuresCol='TrainRegressor_44640d636827_features', inputCols=None, labelCol=None, model=None, numFeatures=0)[source]

Set the (keyword only) parameters

synapse.ml.train.TrainedClassifierModel module

class synapse.ml.train.TrainedClassifierModel.TrainedClassifierModel(java_obj=None, featuresCol=None, labelCol=None, levels=None, model=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • featuresCol (str) – The name of the features column

  • labelCol (str) – The name of the label column

  • levels (object) – the levels

  • model (object) – model produced by training

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns

The name of the features column

Return type

featuresCol

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getLevels()[source]
Returns

the levels

Return type

levels

getModel()[source]
Returns

model produced by training

Return type

model

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
levels = Param(parent='undefined', name='levels', doc='the levels')
model = Param(parent='undefined', name='model', doc='model produced by training')
classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setLevels(value)[source]
Parameters

levels – the levels

setModel(value)[source]
Parameters

model – model produced by training

setParams(featuresCol=None, labelCol=None, levels=None, model=None)[source]

Set the (keyword only) parameters

synapse.ml.train.TrainedRegressorModel module

class synapse.ml.train.TrainedRegressorModel.TrainedRegressorModel(java_obj=None, featuresCol=None, labelCol=None, model=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • featuresCol (str) – The name of the features column

  • labelCol (str) – The name of the label column

  • model (object) – model produced by training

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns

The name of the features column

Return type

featuresCol

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getModel()[source]
Returns

model produced by training

Return type

model

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
model = Param(parent='undefined', name='model', doc='model produced by training')
classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setModel(value)[source]
Parameters

model – model produced by training

setParams(featuresCol=None, labelCol=None, model=None)[source]

Set the (keyword only) parameters

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.