mmlspark.train package

Submodules

mmlspark.train.ComputeModelStatistics module

class mmlspark.train.ComputeModelStatistics.ComputeModelStatistics(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • evaluationMetric (str) – Metric to evaluate models with (default: all)

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

getEvaluationMetric()[source]
Returns

Metric to evaluate models with (default: all)

Return type

str

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

str

getScoredLabelsCol()[source]
Returns

Scored labels column name, only required if using SparkML estimators

Return type

str

getScoresCol()[source]
Returns

Scores or raw prediction column name, only required if using SparkML estimators

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setEvaluationMetric(value)[source]
Parameters

evaluationMetric (str) – Metric to evaluate models with (default: all)

setLabelCol(value)[source]
Parameters

labelCol (str) – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoresCol=None)[source]

Set the (keyword only) parameters

Parameters
  • evaluationMetric (str) – Metric to evaluate models with (default: all)

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

setScoredLabelsCol(value)[source]
Parameters

scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

setScoresCol(value)[source]
Parameters

scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

mmlspark.train.ComputePerInstanceStatistics module

class mmlspark.train.ComputePerInstanceStatistics.ComputePerInstanceStatistics(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • evaluationMetric (str) – Metric to evaluate models with (default: all)

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoredProbabilitiesCol (str) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

getEvaluationMetric()[source]
Returns

Metric to evaluate models with (default: all)

Return type

str

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

str

getScoredLabelsCol()[source]
Returns

Scored labels column name, only required if using SparkML estimators

Return type

str

getScoredProbabilitiesCol()[source]
Returns

Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

Return type

str

getScoresCol()[source]
Returns

Scores or raw prediction column name, only required if using SparkML estimators

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setEvaluationMetric(value)[source]
Parameters

evaluationMetric (str) – Metric to evaluate models with (default: all)

setLabelCol(value)[source]
Parameters

labelCol (str) – The name of the label column

setParams(evaluationMetric='all', labelCol=None, scoredLabelsCol=None, scoredProbabilitiesCol=None, scoresCol=None)[source]

Set the (keyword only) parameters

Parameters
  • evaluationMetric (str) – Metric to evaluate models with (default: all)

  • labelCol (str) – The name of the label column

  • scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

  • scoredProbabilitiesCol (str) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

  • scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

setScoredLabelsCol(value)[source]
Parameters

scoredLabelsCol (str) – Scored labels column name, only required if using SparkML estimators

setScoredProbabilitiesCol(value)[source]
Parameters

scoredProbabilitiesCol (str) – Scored probabilities, usually calibrated from raw scores, only required if using SparkML estimators

setScoresCol(value)[source]
Parameters

scoresCol (str) – Scores or raw prediction column name, only required if using SparkML estimators

mmlspark.train.TrainClassifier module

class mmlspark.train.TrainClassifier.TrainClassifier(featuresCol=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]

Bases: mmlspark.train._TrainClassifier._TrainClassifier

class mmlspark.train.TrainClassifier.TrainedClassifierModel(java_model=None)[source]

Bases: mmlspark.train._TrainClassifier._TrainedClassifierModel

getModel()[source]

Get the underlying model.

mmlspark.train.TrainRegressor module

class mmlspark.train.TrainRegressor.TrainRegressor(featuresCol=None, labelCol=None, model=None, numFeatures=0)[source]

Bases: mmlspark.train._TrainRegressor._TrainRegressor

class mmlspark.train.TrainRegressor.TrainedRegressorModel(java_model=None)[source]

Bases: mmlspark.train._TrainRegressor._TrainedRegressorModel

getModel()[source]

Get the underlying model.

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.