mmlspark.stages package¶

Submodules¶

mmlspark.stages.Cacher module¶

class mmlspark.stages.Cacher.Cacher(disable=False)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters: disable (bool) – Whether or disable caching (so that you can turn it off during evaluation) (default: false)

getDisable()[source]¶

Returns: Whether or disable caching (so that you can turn it off during evaluation) (default: false)
Return type: bool

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

setDisable(value)[source]¶

Parameters: disable (bool) – Whether or disable caching (so that you can turn it off during evaluation) (default: false)

setParams(disable=False)[source]¶

Set the (keyword only) parameters

Parameters: disable (bool) – Whether or disable caching (so that you can turn it off during evaluation) (default: false)

mmlspark.stages.ClassBalancer module¶

class mmlspark.stages.ClassBalancer.ClassBalancer(broadcastJoin=True, inputCol=None, outputCol='weight')[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker (default: true)
inputCol (str) – The name of the input column
outputCol (str) – The name of the output column (default: weight)

getBroadcastJoin()[source]¶

Returns: Whether to broadcast the class to weight mapping to the worker (default: true)
Return type: bool

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column (default: weight)
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setBroadcastJoin(value)[source]¶

Parameters: broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker (default: true)

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column (default: weight)

setParams(broadcastJoin=True, inputCol=None, outputCol='weight')[source]¶

Set the (keyword only) parameters

Parameters

broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker (default: true)
inputCol (str) – The name of the input column
outputCol (str) – The name of the output column (default: weight)

class mmlspark.stages.ClassBalancer.ClassBalancerModel(java_model=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by ClassBalancer.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

mmlspark.stages.DropColumns module¶

class mmlspark.stages.DropColumns.DropColumns(cols=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters: cols (list) – Comma separated list of column names

getCols()[source]¶

Returns: Comma separated list of column names
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

setCols(value)[source]¶

Parameters: cols (list) – Comma separated list of column names

setParams(cols=None)[source]¶

Set the (keyword only) parameters

Parameters: cols (list) – Comma separated list of column names

mmlspark.stages.DynamicMiniBatchTransformer module¶

class mmlspark.stages.DynamicMiniBatchTransformer.DynamicMiniBatchTransformer(maxBatchSize=2147483647)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters: maxBatchSize (int) – The max size of the buffer (default: 2147483647)

static getJavaPackage()[source]¶: Returns package name String.

getMaxBatchSize()[source]¶

Returns: The max size of the buffer (default: 2147483647)
Return type: int

classmethod read()[source]¶: Returns an MLReader instance for this class.

setMaxBatchSize(value)[source]¶

Parameters: maxBatchSize (int) – The max size of the buffer (default: 2147483647)

setParams(maxBatchSize=2147483647)[source]¶

Set the (keyword only) parameters

Parameters: maxBatchSize (int) – The max size of the buffer (default: 2147483647)

mmlspark.stages.EnsembleByKey module¶

class mmlspark.stages.EnsembleByKey.EnsembleByKey(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

colNames (list) – Names of the result of each col
collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
cols (list) – Cols to ensemble
keys (list) – Keys to group by
strategy (str) – How to ensemble the scores, ex: mean (default: mean)
vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization

getColNames()[source]¶

Returns: Names of the result of each col
Return type: list

getCollapseGroup()[source]¶

Returns: Whether to collapse all items in group to one entry (default: true)
Return type: bool

getCols()[source]¶

Returns: Cols to ensemble
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getKeys()[source]¶

Returns: Keys to group by
Return type: list

getStrategy()[source]¶

Returns: How to ensemble the scores, ex: mean (default: mean)
Return type: str

getVectorDims()[source]¶

Returns: the dimensions of any vector columns, used to avoid materialization
Return type: dict

classmethod read()[source]¶: Returns an MLReader instance for this class.

setColNames(value)[source]¶

Parameters: colNames (list) – Names of the result of each col

setCollapseGroup(value)[source]¶

Parameters: collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)

setCols(value)[source]¶

Parameters: cols (list) – Cols to ensemble

setKeys(value)[source]¶

Parameters: keys (list) – Keys to group by

setParams(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶

Set the (keyword only) parameters

Parameters

colNames (list) – Names of the result of each col
collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
cols (list) – Cols to ensemble
keys (list) – Keys to group by
strategy (str) – How to ensemble the scores, ex: mean (default: mean)
vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization

setStrategy(value)[source]¶

Parameters: strategy (str) – How to ensemble the scores, ex: mean (default: mean)

setVectorDims(value)[source]¶

Parameters: vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization

mmlspark.stages.Explode module¶

class mmlspark.stages.Explode.Explode(inputCol=None, outputCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column (default: [self.uid]_output)

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column (default: [self.uid]_output)
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(inputCol=None, outputCol=None)[source]¶

Set the (keyword only) parameters

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column (default: [self.uid]_output)

mmlspark.stages.FixedMiniBatchTransformer module¶

class mmlspark.stages.FixedMiniBatchTransformer.FixedMiniBatchTransformer(batchSize=None, buffered=False, maxBufferSize=2147483647)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

batchSize (int) – The max size of the buffer
buffered (bool) – Whether or not to buffer batches in memory (default: false)
maxBufferSize (int) – The max size of the buffer (default: 2147483647)

getBatchSize()[source]¶

Returns: The max size of the buffer
Return type: int

getBuffered()[source]¶

Returns: Whether or not to buffer batches in memory (default: false)
Return type: bool

static getJavaPackage()[source]¶: Returns package name String.

getMaxBufferSize()[source]¶

Returns: The max size of the buffer (default: 2147483647)
Return type: int

classmethod read()[source]¶: Returns an MLReader instance for this class.

setBatchSize(value)[source]¶

Parameters: batchSize (int) – The max size of the buffer

setBuffered(value)[source]¶

Parameters: buffered (bool) – Whether or not to buffer batches in memory (default: false)

setMaxBufferSize(value)[source]¶

Parameters: maxBufferSize (int) – The max size of the buffer (default: 2147483647)

setParams(batchSize=None, buffered=False, maxBufferSize=2147483647)[source]¶

Set the (keyword only) parameters

Parameters

batchSize (int) – The max size of the buffer
buffered (bool) – Whether or not to buffer batches in memory (default: false)
maxBufferSize (int) – The max size of the buffer (default: 2147483647)

mmlspark.stages.FlattenBatch module¶

class mmlspark.stages.FlattenBatch.FlattenBatch[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

setParams()[source]¶

Set the (keyword only) parameters

Args:

mmlspark.stages.Lambda module¶

class mmlspark.stages.Lambda.Lambda(transformFunc=None, transformSchemaFunc=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

transformFunc (object) – holder for dataframe function
transformSchemaFunc (object) – the output schema after the transformation

static getJavaPackage()[source]¶: Returns package name String.

getTransformFunc()[source]¶

Returns: holder for dataframe function
Return type: object

getTransformSchemaFunc()[source]¶

Returns: the output schema after the transformation
Return type: object

classmethod read()[source]¶: Returns an MLReader instance for this class.

setParams(transformFunc=None, transformSchemaFunc=None)[source]¶

Set the (keyword only) parameters

Parameters

transformFunc (object) – holder for dataframe function
transformSchemaFunc (object) – the output schema after the transformation

setTransformFunc(value)[source]¶

Parameters: transformFunc (object) – holder for dataframe function

setTransformSchemaFunc(value)[source]¶

Parameters: transformSchemaFunc (object) – the output schema after the transformation

mmlspark.stages.MultiColumnAdapter module¶

class mmlspark.stages.MultiColumnAdapter.MultiColumnAdapter(baseStage=None, inputCols=None, outputCols=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

baseStage (object) – base pipeline stage to apply to every column
inputCols (list) – list of column names encoded as a string
outputCols (list) – list of column names encoded as a string

getBaseStage()[source]¶

Returns: base pipeline stage to apply to every column
Return type: object

getInputCols()[source]¶

Returns: list of column names encoded as a string
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getOutputCols()[source]¶

Returns: list of column names encoded as a string
Return type: list

classmethod read()[source]¶: Returns an MLReader instance for this class.

setBaseStage(value)[source]¶

Parameters: baseStage (object) – base pipeline stage to apply to every column

setInputCols(value)[source]¶

Parameters: inputCols (list) – list of column names encoded as a string

setOutputCols(value)[source]¶

Parameters: outputCols (list) – list of column names encoded as a string

setParams(baseStage=None, inputCols=None, outputCols=None)[source]¶

Set the (keyword only) parameters

Parameters

baseStage (object) – base pipeline stage to apply to every column
inputCols (list) – list of column names encoded as a string
outputCols (list) – list of column names encoded as a string

class mmlspark.stages.MultiColumnAdapter.PipelineModel(java_model=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by MultiColumnAdapter.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

mmlspark.stages.RenameColumn module¶

class mmlspark.stages.RenameColumn.RenameColumn(inputCol=None, outputCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None)[source]¶

Set the (keyword only) parameters

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column

mmlspark.stages.Repartition module¶

class mmlspark.stages.Repartition.Repartition(disable=False, n=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

disable (bool) – Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)
n (int) – Number of partitions

getDisable()[source]¶

Returns: Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)
Return type: bool

static getJavaPackage()[source]¶: Returns package name String.

getN()[source]¶

Returns: Number of partitions
Return type: int

classmethod read()[source]¶: Returns an MLReader instance for this class.

setDisable(value)[source]¶

Parameters: disable (bool) – Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)

setN(value)[source]¶

Parameters: n (int) – Number of partitions

setParams(disable=False, n=None)[source]¶

Set the (keyword only) parameters

Parameters

disable (bool) – Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)
n (int) – Number of partitions

mmlspark.stages.SelectColumns module¶

class mmlspark.stages.SelectColumns.SelectColumns(cols=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters: cols (list) – Comma separated list of selected column names

getCols()[source]¶

Returns: Comma separated list of selected column names
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

setCols(value)[source]¶

Parameters: cols (list) – Comma separated list of selected column names

setParams(cols=None)[source]¶

Set the (keyword only) parameters

Parameters: cols (list) – Comma separated list of selected column names

mmlspark.stages.StratifiedRepartition module¶

class mmlspark.stages.StratifiedRepartition.StratifiedRepartition(labelCol=None, mode='mixed', seed=539887434)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

labelCol (str) – The name of the label column
mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)
seed (long) – random seed (default: 539887434)

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns: The name of the label column
Return type: str

getMode()[source]¶

Returns: Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)
Return type: str

getSeed()[source]¶

Returns: random seed (default: 539887434)
Return type: long

classmethod read()[source]¶: Returns an MLReader instance for this class.

setLabelCol(value)[source]¶

Parameters: labelCol (str) – The name of the label column

setMode(value)[source]¶

Parameters: mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)

setParams(labelCol=None, mode='mixed', seed=539887434)[source]¶

Set the (keyword only) parameters

Parameters

labelCol (str) – The name of the label column
mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)
seed (long) – random seed (default: 539887434)

setSeed(value)[source]¶

Parameters: seed (long) – random seed (default: 539887434)

mmlspark.stages.SummarizeData module¶

class mmlspark.stages.SummarizeData.SummarizeData(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

basic (bool) – Compute basic statistics (default: true)
counts (bool) – Compute count statistics (default: true)
errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
percentiles (bool) – Compute percentiles (default: true)
sample (bool) – Compute sample statistics (default: true)

getBasic()[source]¶

Returns: Compute basic statistics (default: true)
Return type: bool

getCounts()[source]¶

Returns: Compute count statistics (default: true)
Return type: bool

getErrorThreshold()[source]¶

Returns: Threshold for quantiles - 0 is exact (default: 0.0)
Return type: double

static getJavaPackage()[source]¶: Returns package name String.

getPercentiles()[source]¶

Returns: Compute percentiles (default: true)
Return type: bool

getSample()[source]¶

Returns: Compute sample statistics (default: true)
Return type: bool

classmethod read()[source]¶: Returns an MLReader instance for this class.

setBasic(value)[source]¶

Parameters: basic (bool) – Compute basic statistics (default: true)

setCounts(value)[source]¶

Parameters: counts (bool) – Compute count statistics (default: true)

setErrorThreshold(value)[source]¶

Parameters: errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)

setParams(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]¶

Set the (keyword only) parameters

Parameters

basic (bool) – Compute basic statistics (default: true)
counts (bool) – Compute count statistics (default: true)
errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
percentiles (bool) – Compute percentiles (default: true)
sample (bool) – Compute sample statistics (default: true)

setPercentiles(value)[source]¶

Parameters: percentiles (bool) – Compute percentiles (default: true)

setSample(value)[source]¶

Parameters: sample (bool) – Compute sample statistics (default: true)

mmlspark.stages.TextPreprocessor module¶

class mmlspark.stages.TextPreprocessor.TextPreprocessor(inputCol=None, map=None, normFunc=None, outputCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCol (str) – The name of the input column
map (dict) – Map of substring match to replacement
normFunc (str) – Name of normalization function to apply
outputCol (str) – The name of the output column

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getMap()[source]¶

Returns: Map of substring match to replacement
Return type: dict

getNormFunc()[source]¶

Returns: Name of normalization function to apply
Return type: str

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setMap(value)[source]¶

Parameters: map (dict) – Map of substring match to replacement

setNormFunc(value)[source]¶

Parameters: normFunc (str) – Name of normalization function to apply

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCol=None, map=None, normFunc=None, outputCol=None)[source]¶

Set the (keyword only) parameters

Parameters

inputCol (str) – The name of the input column
map (dict) – Map of substring match to replacement
normFunc (str) – Name of normalization function to apply
outputCol (str) – The name of the output column

mmlspark.stages.TimeIntervalMiniBatchTransformer module¶

class mmlspark.stages.TimeIntervalMiniBatchTransformer.TimeIntervalMiniBatchTransformer(maxBatchSize=2147483647, millisToWait=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

maxBatchSize (int) – The max size of the buffer (default: 2147483647)
millisToWait (int) – The time to wait before constructing a batch

static getJavaPackage()[source]¶: Returns package name String.

getMaxBatchSize()[source]¶

Returns: The max size of the buffer (default: 2147483647)
Return type: int

getMillisToWait()[source]¶

Returns: The time to wait before constructing a batch
Return type: int

classmethod read()[source]¶: Returns an MLReader instance for this class.

setMaxBatchSize(value)[source]¶

Parameters: maxBatchSize (int) – The max size of the buffer (default: 2147483647)

setMillisToWait(value)[source]¶

Parameters: millisToWait (int) – The time to wait before constructing a batch

setParams(maxBatchSize=2147483647, millisToWait=None)[source]¶

Set the (keyword only) parameters

Parameters

maxBatchSize (int) – The max size of the buffer (default: 2147483647)
millisToWait (int) – The time to wait before constructing a batch

mmlspark.stages.Timer module¶

class mmlspark.stages.Timer.Timer(disableMaterialization=True, logToScala=True, stage=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation) (default: true)
logToScala (bool) – Whether to output the time to the scala console (default: true)
stage (object) – The stage to time

getDisableMaterialization()[source]¶

Returns: Whether to disable timing (so that one can turn it off for evaluation) (default: true)
Return type: bool

static getJavaPackage()[source]¶: Returns package name String.

getLogToScala()[source]¶

Returns: Whether to output the time to the scala console (default: true)
Return type: bool

getStage()[source]¶

Returns: The stage to time
Return type: object

classmethod read()[source]¶: Returns an MLReader instance for this class.

setDisableMaterialization(value)[source]¶

Parameters: disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation) (default: true)

setLogToScala(value)[source]¶

Parameters: logToScala (bool) – Whether to output the time to the scala console (default: true)

setParams(disableMaterialization=True, logToScala=True, stage=None)[source]¶

Set the (keyword only) parameters

Parameters

disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation) (default: true)
logToScala (bool) – Whether to output the time to the scala console (default: true)
stage (object) – The stage to time

setStage(value)[source]¶

Parameters: stage (object) – The stage to time

class mmlspark.stages.Timer.TimerModel(java_model=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by Timer.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

mmlspark.stages.UDFTransformer module¶

class mmlspark.stages.UDFTransformer.UDFTransformer(inputCol=None, inputCols=None, outputCol=None, udf=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCol (str) – The name of the input column (default: )
outputCol (str) – The name of the output column
udf (object) – User Defined Python Function to be applied to the DF input col
udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]¶

Returns: The name of the input column (default: )
Return type: str

getInputCols()[source]¶

Returns: The name of the input column (default: )
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getUDF()[source]¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column (default: )

setInputCols(value)[source]¶

Parameters: inputCols (list) – The names of the input columns (default: )

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setUDF(udf)[source]¶

mmlspark.stages.UnicodeNormalize module¶

class mmlspark.stages.UnicodeNormalize.UnicodeNormalize(form=None, inputCol=None, lower=None, outputCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

form (str) – Unicode normalization form: NFC, NFD, NFKC, NFKD
inputCol (str) – The name of the input column
lower (bool) – Lowercase text
outputCol (str) – The name of the output column

getForm()[source]¶

Returns: Unicode normalization form: NFC, NFD, NFKC, NFKD
Return type: str

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getLower()[source]¶

Returns: Lowercase text
Return type: bool

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setForm(value)[source]¶

Parameters: form (str) – Unicode normalization form: NFC, NFD, NFKC, NFKD

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setLower(value)[source]¶

Parameters: lower (bool) – Lowercase text

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(form=None, inputCol=None, lower=None, outputCol=None)[source]¶

Set the (keyword only) parameters

Parameters

form (str) – Unicode normalization form: NFC, NFD, NFKC, NFKD
inputCol (str) – The name of the input column
lower (bool) – Lowercase text
outputCol (str) – The name of the output column

Module contents¶

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.