mmlspark.train package¶
Submodules¶
mmlspark.train.ComputeModelStatistics module¶
- class mmlspark.train.ComputeModelStatistics.ComputeModelStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
evaluationMetric (object) – Metric to evaluate models with
labelCol (object) – The name of the label column
scoredLabelsCol (object) – Scored labels column name, only required if using SparkML estimators
scoresCol (object) – Scores or raw prediction column name, only required if using SparkML estimators
- evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')¶
- getScoredLabelsCol()[source]¶
- Returns
Scored labels column name, only required if using SparkML estimators
- Return type
scoredLabelsCol
- getScoresCol()[source]¶
- Returns
Scores or raw prediction column name, only required if using SparkML estimators
- Return type
scoresCol
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶
- scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')¶
- scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')¶
- setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]¶
Set the (keyword only) parameters
mmlspark.train.ComputePerInstanceStatistics module¶
- class mmlspark.train.ComputePerInstanceStatistics.ComputePerInstanceStatistics(java_obj=None, evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
evaluationMetric (object) – Metric to evaluate models with
labelCol (object) – The name of the label column
scoredLabelsCol (object) – Scored labels column name, only required if using SparkML estimators
scoredProbabilitiesCol (object) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators
scoresCol (object) – Scores or raw prediction column name, only required if using SparkML estimators
- evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')¶
- getScoredLabelsCol()[source]¶
- Returns
Scored labels column name, only required if using SparkML estimators
- Return type
scoredLabelsCol
- getScoredProbabilitiesCol()[source]¶
- Returns
Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators
- Return type
scoredProbabilitiesCol
- getScoresCol()[source]¶
- Returns
Scores or raw prediction column name, only required if using SparkML estimators
- Return type
scoresCol
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶
- scoredLabelsCol = Param(parent='undefined', name='scoredLabelsCol', doc='Scored labels column name, only required if using SparkML estimators')¶
- scoredProbabilitiesCol = Param(parent='undefined', name='scoredProbabilitiesCol', doc='Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators')¶
- scoresCol = Param(parent='undefined', name='scoresCol', doc='Scores or raw prediction column name, only required if using SparkML estimators')¶
- setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]¶
Set the (keyword only) parameters
- setScoredLabelsCol(value)[source]¶
- Parameters
scoredLabelsCol – Scored labels column name, only required if using SparkML estimators
mmlspark.train.TrainClassifier module¶
- class mmlspark.train.TrainClassifier.TrainClassifier(java_obj=None, featuresCol='TrainClassifier_2580c48dec35_features', labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶
- labels = Param(parent='undefined', name='labels', doc='Sorted label values on the labels column')¶
- model = Param(parent='undefined', name='model', doc='Classifier to run')¶
- numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')¶
- reindexLabel = Param(parent='undefined', name='reindexLabel', doc='Re-index the label column')¶
mmlspark.train.TrainRegressor module¶
- class mmlspark.train.TrainRegressor.TrainRegressor(java_obj=None, featuresCol='TrainRegressor_d3ce47e4fe56_features', labelCol=None, model=None, numFeatures=0)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶
- model = Param(parent='undefined', name='model', doc='Regressor to run')¶
- numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash to')¶
mmlspark.train.TrainedClassifierModel module¶
- class mmlspark.train.TrainedClassifierModel.TrainedClassifierModel(java_obj=None, featuresCol=None, labelCol=None, levels=None, model=None)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaModel
- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶
- levels = Param(parent='undefined', name='levels', doc='the levels')¶
- model = Param(parent='undefined', name='model', doc='model produced by training')¶
mmlspark.train.TrainedRegressorModel module¶
- class mmlspark.train.TrainedRegressorModel.TrainedRegressorModel(java_obj=None, featuresCol=None, labelCol=None, model=None)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaModel
- Parameters
- featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')¶
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')¶
- model = Param(parent='undefined', name='model', doc='model produced by training')¶
Module contents¶
MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. MMLSpark adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
MMLSpark also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, MMLSpark provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
MMLSpark requires Scala 2.11, Spark 2.4+, and Python 3.5+.