mmlspark.train package¶

Submodules¶

mmlspark.train.ComputeModelStatistics module¶

class mmlspark.train.ComputeModelStatistics.ComputeModelStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

evaluationMetric (object) – Metric to evaluate models with
labelCol (object) – The name of the label column
scoredLabelsCol (object) – Scored labels column name, only required if using SparkML estimators
scoresCol (object) – Scores or raw prediction column name, only required if using SparkML estimators

evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')¶

getEvaluationMetric()[source]¶

Returns: Metric to evaluate models with
Return type: evaluationMetric

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns: The name of the label column
Return type: labelCol

getScoredLabelsCol()[source]¶

Returns: Scored labels column name, only required if using SparkML estimators
Return type: scoredLabelsCol

getScoresCol()[source]¶

Returns: Scores or raw prediction column name, only required if using SparkML estimators
Return type: scoresCol

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')¶

scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')¶

setEvaluationMetric(value)[source]¶

Parameters: evaluationMetric – Metric to evaluate models with

setLabelCol(value)[source]¶

Parameters: labelCol – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]¶: Set the (keyword only) parameters

setScoredLabelsCol(value)[source]¶

Parameters: scoredLabelsCol – Scored labels column name, only required if using SparkML estimators

setScoresCol(value)[source]¶

Parameters: scoresCol – Scores or raw prediction column name, only required if using SparkML estimators

mmlspark.train.ComputePerInstanceStatistics module¶

class mmlspark.train.ComputePerInstanceStatistics.ComputePerInstanceStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

evaluationMetric (object) – Metric to evaluate models with
labelCol (object) – The name of the label column
scoredLabelsCol (object) – Scored labels column name, only required if using SparkML estimators
scoredProbabilitiesCol (object) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators
scoresCol (object) – Scores or raw prediction column name, only required if using SparkML estimators

evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')¶

getEvaluationMetric()[source]¶

Returns: Metric to evaluate models with
Return type: evaluationMetric

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns: The name of the label column
Return type: labelCol

getScoredLabelsCol()[source]¶

Returns: Scored labels column name, only required if using SparkML estimators
Return type: scoredLabelsCol

getScoredProbabilitiesCol()[source]¶

Returns: Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators
Return type: scoredProbabilitiesCol

getScoresCol()[source]¶

Returns: Scores or raw prediction column name, only required if using SparkML estimators
Return type: scoresCol

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')¶

scoredProbabilitiesCol = Param(parent='undefined', name='scoredProbabilitiesCol', doc='Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators')¶

scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')¶

setEvaluationMetric(value)[source]¶

Parameters: evaluationMetric – Metric to evaluate models with

setLabelCol(value)[source]¶

Parameters: labelCol – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]¶: Set the (keyword only) parameters

setScoredLabelsCol(value)[source]¶

Parameters: scoredLabelsCol – Scored labels column name, only required if using SparkML estimators

setScoredProbabilitiesCol(value)[source]¶

Parameters: scoredProbabilitiesCol – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

setScoresCol(value)[source]¶

Parameters: scoresCol – Scores or raw prediction column name, only required if using SparkML estimators

mmlspark.train.TrainClassifier module¶

class mmlspark.train.TrainClassifier.TrainClassifier(java_obj=None, featuresCol='TrainClassifier_2580c48dec35_features', labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

featuresCol (object) – The name of the features column
labelCol (object) – The name of the label column
labels (list) – Sorted label values on the labels column
model (object) – Classifier to run
numFeatures (int) – Number of features to hash to
reindexLabel (bool) – Re-index the label column

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶

getFeaturesCol()[source]¶

Returns: The name of the features column
Return type: featuresCol

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns: The name of the label column
Return type: labelCol

getLabels()[source]¶

Returns: Sorted label values on the labels column
Return type: labels

getModel()[source]¶

Returns: Classifier to run
Return type: model

getNumFeatures()[source]¶

Returns: Number of features to hash to
Return type: numFeatures

getReindexLabel()[source]¶

Returns: Re-index the label column
Return type: reindexLabel

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶

labels = Param(parent='undefined', name='labels', doc='Sorted label values on the labels column')¶

model = Param(parent='undefined', name='model', doc='Classifier to run')¶

numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

reindexLabel = Param(parent='undefined', name='reindexLabel', doc='Re-index the label column')¶

setFeaturesCol(value)[source]¶

Parameters: featuresCol – The name of the features column

setLabelCol(value)[source]¶

Parameters: labelCol – The name of the label column

setLabels(value)[source]¶

Parameters: labels – Sorted label values on the labels column

setModel(value)[source]¶

Parameters: model – Classifier to run

setNumFeatures(value)[source]¶

Parameters: numFeatures – Number of features to hash to

setParams(featuresCol='TrainClassifier_2580c48dec35_features', labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]¶: Set the (keyword only) parameters

setReindexLabel(value)[source]¶

Parameters: reindexLabel – Re-index the label column

mmlspark.train.TrainRegressor module¶

class mmlspark.train.TrainRegressor.TrainRegressor(java_obj=None, featuresCol='TrainRegressor_d3ce47e4fe56_features', labelCol=None, model=None, numFeatures=0)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

featuresCol (object) – The name of the features column
labelCol (object) – The name of the label column
model (object) – Regressor to run
numFeatures (int) – Number of features to hash to

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶

getFeaturesCol()[source]¶

Returns: The name of the features column
Return type: featuresCol

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns: The name of the label column
Return type: labelCol

getModel()[source]¶

Returns: Regressor to run
Return type: model

getNumFeatures()[source]¶

Returns: Number of features to hash to
Return type: numFeatures

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶

model = Param(parent='undefined', name='model', doc='Regressor to run')¶

numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

setFeaturesCol(value)[source]¶

Parameters: featuresCol – The name of the features column

setLabelCol(value)[source]¶

Parameters: labelCol – The name of the label column

setModel(value)[source]¶

Parameters: model – Regressor to run

setNumFeatures(value)[source]¶

Parameters: numFeatures – Number of features to hash to

setParams(featuresCol='TrainRegressor_d3ce47e4fe56_features', labelCol=None, model=None, numFeatures=0)[source]¶: Set the (keyword only) parameters

mmlspark.train.TrainedClassifierModel module¶

class mmlspark.train.TrainedClassifierModel.TrainedClassifierModel(java_obj=None, featuresCol=None, labelCol=None, levels=None, model=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters

featuresCol (object) – The name of the features column
labelCol (object) – The name of the label column
levels (object) – the levels
model (object) – model produced by training

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶

getFeaturesCol()[source]¶

Returns: The name of the features column
Return type: featuresCol

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns: The name of the label column
Return type: labelCol

getLevels()[source]¶

Returns: the levels
Return type: levels

getModel()[source]¶

Returns: model produced by training
Return type: model

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶

levels = Param(parent='undefined', name='levels', doc='the levels')¶

model = Param(parent='undefined', name='model', doc='model produced by training')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

setFeaturesCol(value)[source]¶

Parameters: featuresCol – The name of the features column

setLabelCol(value)[source]¶

Parameters: labelCol – The name of the label column

setLevels(value)[source]¶

Parameters: levels – the levels

setModel(value)[source]¶

Parameters: model – model produced by training

setParams(featuresCol=None, labelCol=None, levels=None, model=None)[source]¶: Set the (keyword only) parameters

mmlspark.train.TrainedRegressorModel module¶

class mmlspark.train.TrainedRegressorModel.TrainedRegressorModel(java_obj=None, featuresCol=None, labelCol=None, model=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters

featuresCol (object) – The name of the features column
labelCol (object) – The name of the label column
model (object) – model produced by training

featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶

getFeaturesCol()[source]¶

Returns: The name of the features column
Return type: featuresCol

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns: The name of the label column
Return type: labelCol

getModel()[source]¶

Returns: model produced by training
Return type: model

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶

model = Param(parent='undefined', name='model', doc='model produced by training')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

setFeaturesCol(value)[source]¶

Parameters: featuresCol – The name of the features column

setLabelCol(value)[source]¶

Parameters: labelCol – The name of the label column

setModel(value)[source]¶

Parameters: model – model produced by training

setParams(featuresCol=None, labelCol=None, model=None)[source]¶: Set the (keyword only) parameters

Module contents¶

MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. MMLSpark adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

MMLSpark also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, MMLSpark provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

MMLSpark requires Scala 2.11, Spark 2.4+, and Python 3.5+.