synapse.ml.train package

Submodules

synapse.ml.train.ComputeModelStatistics module

class synapse.ml.train.ComputeModelStatistics.ComputeModelStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • evaluationMetric (str) – Metric to evaluate models with

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')
getEvaluationMetric()[source]
Returns:

Metric to evaluate models with

Return type:

evaluationMetric

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getScoredLabelsCol()[source]
Returns:

Scored labels column name, only required if using SparkML estimators

Return type:

scoredLabelsCol

getScoresCol()[source]
Returns:

Scores or raw prediction column name, only required if using SparkML estimators

Return type:

scoresCol

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
classmethod read()[source]

Returns an MLReader instance for this class.

scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')
scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')
setEvaluationMetric(value)[source]
Parameters:

evaluationMetric – Metric to evaluate models with

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]

Set the (keyword only) parameters

setScoredLabelsCol(value)[source]
Parameters:

scoredLabelsCol – Scored labels column name, only required if using SparkML estimators

setScoresCol(value)[source]
Parameters:

scoresCol – Scores or raw prediction column name, only required if using SparkML estimators

synapse.ml.train.ComputePerInstanceStatistics module

class synapse.ml.train.ComputePerInstanceStatistics.ComputePerInstanceStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • evaluationMetric (str) – Metric to evaluate models with

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoredProbabilitiesCol (str) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')
getEvaluationMetric()[source]
Returns:

Metric to evaluate models with

Return type:

evaluationMetric

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getScoredLabelsCol()[source]
Returns:

Scored labels column name, only required if using SparkML estimators

Return type:

scoredLabelsCol

getScoredProbabilitiesCol()[source]
Returns:

Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

Return type:

scoredProbabilitiesCol

getScoresCol()[source]
Returns:

Scores or raw prediction column name, only required if using SparkML estimators

Return type:

scoresCol

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
classmethod read()[source]

Returns an MLReader instance for this class.

scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')
scoredProbabilitiesCol = Param(parent='undefined', name='scoredProbabilitiesCol', doc='Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators')
scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')
setEvaluationMetric(value)[source]
Parameters:

evaluationMetric – Metric to evaluate models with

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]

Set the (keyword only) parameters

setScoredLabelsCol(value)[source]
Parameters:

scoredLabelsCol – Scored labels column name, only required if using SparkML estimators

setScoredProbabilitiesCol(value)[source]
Parameters:

scoredProbabilitiesCol – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

setScoresCol(value)[source]
Parameters:

scoresCol – Scores or raw prediction column name, only required if using SparkML estimators

synapse.ml.train.TrainClassifier module

class synapse.ml.train.TrainClassifier.TrainClassifier(java_obj=None, featuresCol='TrainClassifier_d35110d4538f_features', inputCols=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • featuresCol (str) – The name of the features column

  • inputCols (list) – The names of the input columns

  • labelCol (str) – The name of the label column

  • labels (list) – Sorted label values on the labels column

  • model (object) – Classifier to run

  • numFeatures (int) – Number of features to hash to

  • reindexLabel (bool) – Re-index the label column

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns:

The name of the features column

Return type:

featuresCol

getInputCols()[source]
Returns:

The names of the input columns

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getLabels()[source]
Returns:

Sorted label values on the labels column

Return type:

labels

getModel()[source]
Returns:

Classifier to run

Return type:

model

getNumFeatures()[source]
Returns:

Number of features to hash to

Return type:

numFeatures

getReindexLabel()[source]
Returns:

Re-index the label column

Return type:

reindexLabel

inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
labels = Param(parent='undefined', name='labels', doc='Sorted label values on the labels column')
model = Param(parent='undefined', name='model', doc='Classifier to run')
numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')
classmethod read()[source]

Returns an MLReader instance for this class.

reindexLabel = Param(parent='undefined', name='reindexLabel', doc='Re-index the label column')
setFeaturesCol(value)[source]
Parameters:

featuresCol – The name of the features column

setInputCols(value)[source]
Parameters:

inputCols – The names of the input columns

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setLabels(value)[source]
Parameters:

labels – Sorted label values on the labels column

setModel(value)[source]
Parameters:

model – Classifier to run

setNumFeatures(value)[source]
Parameters:

numFeatures – Number of features to hash to

setParams(featuresCol='TrainClassifier_d35110d4538f_features', inputCols=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]

Set the (keyword only) parameters

setReindexLabel(value)[source]
Parameters:

reindexLabel – Re-index the label column

synapse.ml.train.TrainRegressor module

class synapse.ml.train.TrainRegressor.TrainRegressor(java_obj=None, featuresCol='TrainRegressor_78ae4eb29dbc_features', inputCols=None, labelCol=None, model=None, numFeatures=0)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • featuresCol (str) – The name of the features column

  • inputCols (list) – The names of the input columns

  • labelCol (str) – The name of the label column

  • model (object) – Regressor to run

  • numFeatures (int) – Number of features to hash to

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns:

The name of the features column

Return type:

featuresCol

getInputCols()[source]
Returns:

The names of the input columns

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getModel()[source]
Returns:

Regressor to run

Return type:

model

getNumFeatures()[source]
Returns:

Number of features to hash to

Return type:

numFeatures

inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
model = Param(parent='undefined', name='model', doc='Regressor to run')
numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')
classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters:

featuresCol – The name of the features column

setInputCols(value)[source]
Parameters:

inputCols – The names of the input columns

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setModel(value)[source]
Parameters:

model – Regressor to run

setNumFeatures(value)[source]
Parameters:

numFeatures – Number of features to hash to

setParams(featuresCol='TrainRegressor_78ae4eb29dbc_features', inputCols=None, labelCol=None, model=None, numFeatures=0)[source]

Set the (keyword only) parameters

synapse.ml.train.TrainedClassifierModel module

class synapse.ml.train.TrainedClassifierModel.TrainedClassifierModel(java_obj=None, featuresCol=None, labelCol=None, levels=None, model=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • featuresCol (str) – The name of the features column

  • labelCol (str) – The name of the label column

  • levels (object) – the levels

  • model (object) – model produced by training

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns:

The name of the features column

Return type:

featuresCol

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getLevels()[source]
Returns:

the levels

Return type:

levels

getModel()[source]
Returns:

model produced by training

Return type:

model

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
levels = Param(parent='undefined', name='levels', doc='the levels')
model = Param(parent='undefined', name='model', doc='model produced by training')
classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters:

featuresCol – The name of the features column

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setLevels(value)[source]
Parameters:

levels – the levels

setModel(value)[source]
Parameters:

model – model produced by training

setParams(featuresCol=None, labelCol=None, levels=None, model=None)[source]

Set the (keyword only) parameters

synapse.ml.train.TrainedRegressorModel module

class synapse.ml.train.TrainedRegressorModel.TrainedRegressorModel(java_obj=None, featuresCol=None, labelCol=None, model=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • featuresCol (str) – The name of the features column

  • labelCol (str) – The name of the label column

  • model (object) – model produced by training

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getFeaturesCol()[source]
Returns:

The name of the features column

Return type:

featuresCol

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getModel()[source]
Returns:

model produced by training

Return type:

model

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
model = Param(parent='undefined', name='model', doc='model produced by training')
classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters:

featuresCol – The name of the features column

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setModel(value)[source]
Parameters:

model – model produced by training

setParams(featuresCol=None, labelCol=None, model=None)[source]

Set the (keyword only) parameters

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.