mmlspark.stages package

Submodules

mmlspark.stages.Cacher module

class mmlspark.stages.Cacher.Cacher(disable=False)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

disable (bool) – Whether or disable caching (so that you can turn it off during evaluation) (default: false)

getDisable()[source]
Returns

Whether or disable caching (so that you can turn it off during evaluation) (default: false)

Return type

bool

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setDisable(value)[source]
Parameters

disable (bool) – Whether or disable caching (so that you can turn it off during evaluation) (default: false)

setParams(disable=False)[source]

Set the (keyword only) parameters

Parameters

disable (bool) – Whether or disable caching (so that you can turn it off during evaluation) (default: false)

mmlspark.stages.ClassBalancer module

class mmlspark.stages.ClassBalancer.ClassBalancer(broadcastJoin=True, inputCol=None, outputCol='weight')[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker (default: true)

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column (default: weight)

getBroadcastJoin()[source]
Returns

Whether to broadcast the class to weight mapping to the worker (default: true)

Return type

bool

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column (default: weight)

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setBroadcastJoin(value)[source]
Parameters

broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker (default: true)

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: weight)

setParams(broadcastJoin=True, inputCol=None, outputCol='weight')[source]

Set the (keyword only) parameters

Parameters
  • broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker (default: true)

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column (default: weight)

class mmlspark.stages.ClassBalancer.ClassBalancerModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by ClassBalancer.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

mmlspark.stages.DropColumns module

class mmlspark.stages.DropColumns.DropColumns(cols=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

cols (list) – Comma separated list of column names

getCols()[source]
Returns

Comma separated list of column names

Return type

list

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setCols(value)[source]
Parameters

cols (list) – Comma separated list of column names

setParams(cols=None)[source]

Set the (keyword only) parameters

Parameters

cols (list) – Comma separated list of column names

mmlspark.stages.DynamicMiniBatchTransformer module

class mmlspark.stages.DynamicMiniBatchTransformer.DynamicMiniBatchTransformer(maxBatchSize=2147483647)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

maxBatchSize (int) – The max size of the buffer (default: 2147483647)

static getJavaPackage()[source]

Returns package name String.

getMaxBatchSize()[source]
Returns

The max size of the buffer (default: 2147483647)

Return type

int

classmethod read()[source]

Returns an MLReader instance for this class.

setMaxBatchSize(value)[source]
Parameters

maxBatchSize (int) – The max size of the buffer (default: 2147483647)

setParams(maxBatchSize=2147483647)[source]

Set the (keyword only) parameters

Parameters

maxBatchSize (int) – The max size of the buffer (default: 2147483647)

mmlspark.stages.EnsembleByKey module

class mmlspark.stages.EnsembleByKey.EnsembleByKey(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • colNames (list) – Names of the result of each col

  • collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)

  • cols (list) – Cols to ensemble

  • keys (list) – Keys to group by

  • strategy (str) – How to ensemble the scores, ex: mean (default: mean)

  • vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization

getColNames()[source]
Returns

Names of the result of each col

Return type

list

getCollapseGroup()[source]
Returns

Whether to collapse all items in group to one entry (default: true)

Return type

bool

getCols()[source]
Returns

Cols to ensemble

Return type

list

static getJavaPackage()[source]

Returns package name String.

getKeys()[source]
Returns

Keys to group by

Return type

list

getStrategy()[source]
Returns

How to ensemble the scores, ex: mean (default: mean)

Return type

str

getVectorDims()[source]
Returns

the dimensions of any vector columns, used to avoid materialization

Return type

dict

classmethod read()[source]

Returns an MLReader instance for this class.

setColNames(value)[source]
Parameters

colNames (list) – Names of the result of each col

setCollapseGroup(value)[source]
Parameters

collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)

setCols(value)[source]
Parameters

cols (list) – Cols to ensemble

setKeys(value)[source]
Parameters

keys (list) – Keys to group by

setParams(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]

Set the (keyword only) parameters

Parameters
  • colNames (list) – Names of the result of each col

  • collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)

  • cols (list) – Cols to ensemble

  • keys (list) – Keys to group by

  • strategy (str) – How to ensemble the scores, ex: mean (default: mean)

  • vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization

setStrategy(value)[source]
Parameters

strategy (str) – How to ensemble the scores, ex: mean (default: mean)

setVectorDims(value)[source]
Parameters

vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization

mmlspark.stages.Explode module

class mmlspark.stages.Explode.Explode(inputCol=None, outputCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

mmlspark.stages.FixedMiniBatchTransformer module

class mmlspark.stages.FixedMiniBatchTransformer.FixedMiniBatchTransformer(batchSize=None, buffered=False, maxBufferSize=2147483647)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • buffered (bool) – Whether or not to buffer batches in memory (default: false)

  • maxBufferSize (int) – The max size of the buffer (default: 2147483647)

getBatchSize()[source]
Returns

The max size of the buffer

Return type

int

getBuffered()[source]
Returns

Whether or not to buffer batches in memory (default: false)

Return type

bool

static getJavaPackage()[source]

Returns package name String.

getMaxBufferSize()[source]
Returns

The max size of the buffer (default: 2147483647)

Return type

int

classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize (int) – The max size of the buffer

setBuffered(value)[source]
Parameters

buffered (bool) – Whether or not to buffer batches in memory (default: false)

setMaxBufferSize(value)[source]
Parameters

maxBufferSize (int) – The max size of the buffer (default: 2147483647)

setParams(batchSize=None, buffered=False, maxBufferSize=2147483647)[source]

Set the (keyword only) parameters

Parameters
  • batchSize (int) – The max size of the buffer

  • buffered (bool) – Whether or not to buffer batches in memory (default: false)

  • maxBufferSize (int) – The max size of the buffer (default: 2147483647)

mmlspark.stages.FlattenBatch module

class mmlspark.stages.FlattenBatch.FlattenBatch[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setParams()[source]

Set the (keyword only) parameters

Args:

mmlspark.stages.Lambda module

class mmlspark.stages.Lambda.Lambda(transformFunc=None, transformSchemaFunc=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • transformFunc (object) – holder for dataframe function

  • transformSchemaFunc (object) – the output schema after the transformation

static getJavaPackage()[source]

Returns package name String.

getTransformFunc()[source]
Returns

holder for dataframe function

Return type

object

getTransformSchemaFunc()[source]
Returns

the output schema after the transformation

Return type

object

classmethod read()[source]

Returns an MLReader instance for this class.

setParams(transformFunc=None, transformSchemaFunc=None)[source]

Set the (keyword only) parameters

Parameters
  • transformFunc (object) – holder for dataframe function

  • transformSchemaFunc (object) – the output schema after the transformation

setTransformFunc(value)[source]
Parameters

transformFunc (object) – holder for dataframe function

setTransformSchemaFunc(value)[source]
Parameters

transformSchemaFunc (object) – the output schema after the transformation

mmlspark.stages.MultiColumnAdapter module

class mmlspark.stages.MultiColumnAdapter.MultiColumnAdapter(baseStage=None, inputCols=None, outputCols=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • baseStage (object) – base pipeline stage to apply to every column

  • inputCols (list) – list of column names encoded as a string

  • outputCols (list) – list of column names encoded as a string

getBaseStage()[source]
Returns

base pipeline stage to apply to every column

Return type

object

getInputCols()[source]
Returns

list of column names encoded as a string

Return type

list

static getJavaPackage()[source]

Returns package name String.

getOutputCols()[source]
Returns

list of column names encoded as a string

Return type

list

classmethod read()[source]

Returns an MLReader instance for this class.

setBaseStage(value)[source]
Parameters

baseStage (object) – base pipeline stage to apply to every column

setInputCols(value)[source]
Parameters

inputCols (list) – list of column names encoded as a string

setOutputCols(value)[source]
Parameters

outputCols (list) – list of column names encoded as a string

setParams(baseStage=None, inputCols=None, outputCols=None)[source]

Set the (keyword only) parameters

Parameters
  • baseStage (object) – base pipeline stage to apply to every column

  • inputCols (list) – list of column names encoded as a string

  • outputCols (list) – list of column names encoded as a string

class mmlspark.stages.MultiColumnAdapter.PipelineModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by MultiColumnAdapter.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

mmlspark.stages.RenameColumn module

class mmlspark.stages.RenameColumn.RenameColumn(inputCol=None, outputCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

mmlspark.stages.Repartition module

class mmlspark.stages.Repartition.Repartition(disable=False, n=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • disable (bool) – Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)

  • n (int) – Number of partitions

getDisable()[source]
Returns

Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)

Return type

bool

static getJavaPackage()[source]

Returns package name String.

getN()[source]
Returns

Number of partitions

Return type

int

classmethod read()[source]

Returns an MLReader instance for this class.

setDisable(value)[source]
Parameters

disable (bool) – Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)

setN(value)[source]
Parameters

n (int) – Number of partitions

setParams(disable=False, n=None)[source]

Set the (keyword only) parameters

Parameters
  • disable (bool) – Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)

  • n (int) – Number of partitions

mmlspark.stages.SelectColumns module

class mmlspark.stages.SelectColumns.SelectColumns(cols=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

cols (list) – Comma separated list of selected column names

getCols()[source]
Returns

Comma separated list of selected column names

Return type

list

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setCols(value)[source]
Parameters

cols (list) – Comma separated list of selected column names

setParams(cols=None)[source]

Set the (keyword only) parameters

Parameters

cols (list) – Comma separated list of selected column names

mmlspark.stages.StratifiedRepartition module

class mmlspark.stages.StratifiedRepartition.StratifiedRepartition(labelCol=None, mode='mixed', seed=539887434)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • labelCol (str) – The name of the label column

  • mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)

  • seed (long) – random seed (default: 539887434)

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

The name of the label column

Return type

str

getMode()[source]
Returns

Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)

Return type

str

getSeed()[source]
Returns

random seed (default: 539887434)

Return type

long

classmethod read()[source]

Returns an MLReader instance for this class.

setLabelCol(value)[source]
Parameters

labelCol (str) – The name of the label column

setMode(value)[source]
Parameters

mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)

setParams(labelCol=None, mode='mixed', seed=539887434)[source]

Set the (keyword only) parameters

Parameters
  • labelCol (str) – The name of the label column

  • mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)

  • seed (long) – random seed (default: 539887434)

setSeed(value)[source]
Parameters

seed (long) – random seed (default: 539887434)

mmlspark.stages.SummarizeData module

class mmlspark.stages.SummarizeData.SummarizeData(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • basic (bool) – Compute basic statistics (default: true)

  • counts (bool) – Compute count statistics (default: true)

  • errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)

  • percentiles (bool) – Compute percentiles (default: true)

  • sample (bool) – Compute sample statistics (default: true)

getBasic()[source]
Returns

Compute basic statistics (default: true)

Return type

bool

getCounts()[source]
Returns

Compute count statistics (default: true)

Return type

bool

getErrorThreshold()[source]
Returns

Threshold for quantiles - 0 is exact (default: 0.0)

Return type

double

static getJavaPackage()[source]

Returns package name String.

getPercentiles()[source]
Returns

Compute percentiles (default: true)

Return type

bool

getSample()[source]
Returns

Compute sample statistics (default: true)

Return type

bool

classmethod read()[source]

Returns an MLReader instance for this class.

setBasic(value)[source]
Parameters

basic (bool) – Compute basic statistics (default: true)

setCounts(value)[source]
Parameters

counts (bool) – Compute count statistics (default: true)

setErrorThreshold(value)[source]
Parameters

errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)

setParams(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]

Set the (keyword only) parameters

Parameters
  • basic (bool) – Compute basic statistics (default: true)

  • counts (bool) – Compute count statistics (default: true)

  • errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)

  • percentiles (bool) – Compute percentiles (default: true)

  • sample (bool) – Compute sample statistics (default: true)

setPercentiles(value)[source]
Parameters

percentiles (bool) – Compute percentiles (default: true)

setSample(value)[source]
Parameters

sample (bool) – Compute sample statistics (default: true)

mmlspark.stages.TextPreprocessor module

class mmlspark.stages.TextPreprocessor.TextPreprocessor(inputCol=None, map=None, normFunc=None, outputCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCol (str) – The name of the input column

  • map (dict) – Map of substring match to replacement

  • normFunc (str) – Name of normalization function to apply

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getMap()[source]
Returns

Map of substring match to replacement

Return type

dict

getNormFunc()[source]
Returns

Name of normalization function to apply

Return type

str

getOutputCol()[source]
Returns

The name of the output column

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setMap(value)[source]
Parameters

map (dict) – Map of substring match to replacement

setNormFunc(value)[source]
Parameters

normFunc (str) – Name of normalization function to apply

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCol=None, map=None, normFunc=None, outputCol=None)[source]

Set the (keyword only) parameters

Parameters
  • inputCol (str) – The name of the input column

  • map (dict) – Map of substring match to replacement

  • normFunc (str) – Name of normalization function to apply

  • outputCol (str) – The name of the output column

mmlspark.stages.TimeIntervalMiniBatchTransformer module

class mmlspark.stages.TimeIntervalMiniBatchTransformer.TimeIntervalMiniBatchTransformer(maxBatchSize=2147483647, millisToWait=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • maxBatchSize (int) – The max size of the buffer (default: 2147483647)

  • millisToWait (int) – The time to wait before constructing a batch

static getJavaPackage()[source]

Returns package name String.

getMaxBatchSize()[source]
Returns

The max size of the buffer (default: 2147483647)

Return type

int

getMillisToWait()[source]
Returns

The time to wait before constructing a batch

Return type

int

classmethod read()[source]

Returns an MLReader instance for this class.

setMaxBatchSize(value)[source]
Parameters

maxBatchSize (int) – The max size of the buffer (default: 2147483647)

setMillisToWait(value)[source]
Parameters

millisToWait (int) – The time to wait before constructing a batch

setParams(maxBatchSize=2147483647, millisToWait=None)[source]

Set the (keyword only) parameters

Parameters
  • maxBatchSize (int) – The max size of the buffer (default: 2147483647)

  • millisToWait (int) – The time to wait before constructing a batch

mmlspark.stages.Timer module

class mmlspark.stages.Timer.Timer(disableMaterialization=True, logToScala=True, stage=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation) (default: true)

  • logToScala (bool) – Whether to output the time to the scala console (default: true)

  • stage (object) – The stage to time

getDisableMaterialization()[source]
Returns

Whether to disable timing (so that one can turn it off for evaluation) (default: true)

Return type

bool

static getJavaPackage()[source]

Returns package name String.

getLogToScala()[source]
Returns

Whether to output the time to the scala console (default: true)

Return type

bool

getStage()[source]
Returns

The stage to time

Return type

object

classmethod read()[source]

Returns an MLReader instance for this class.

setDisableMaterialization(value)[source]
Parameters

disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation) (default: true)

setLogToScala(value)[source]
Parameters

logToScala (bool) – Whether to output the time to the scala console (default: true)

setParams(disableMaterialization=True, logToScala=True, stage=None)[source]

Set the (keyword only) parameters

Parameters
  • disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation) (default: true)

  • logToScala (bool) – Whether to output the time to the scala console (default: true)

  • stage (object) – The stage to time

setStage(value)[source]
Parameters

stage (object) – The stage to time

class mmlspark.stages.Timer.TimerModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by Timer.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

mmlspark.stages.UDFTransformer module

class mmlspark.stages.UDFTransformer.UDFTransformer(inputCol=None, inputCols=None, outputCol=None, udf=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCol (str) – The name of the input column (default: )

  • outputCol (str) – The name of the output column

  • udf (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]
Returns

The name of the input column (default: )

Return type

str

getInputCols()[source]
Returns

The name of the input column (default: )

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getUDF()[source]
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column (default: )

setInputCols(value)[source]
Parameters

inputCols (list) – The names of the input columns (default: )

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setUDF(udf)[source]

mmlspark.stages.UnicodeNormalize module

class mmlspark.stages.UnicodeNormalize.UnicodeNormalize(form=None, inputCol=None, lower=None, outputCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • form (str) – Unicode normalization form: NFC, NFD, NFKC, NFKD

  • inputCol (str) – The name of the input column

  • lower (bool) – Lowercase text

  • outputCol (str) – The name of the output column

getForm()[source]
Returns

Unicode normalization form: NFC, NFD, NFKC, NFKD

Return type

str

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getLower()[source]
Returns

Lowercase text

Return type

bool

getOutputCol()[source]
Returns

The name of the output column

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setForm(value)[source]
Parameters

form (str) – Unicode normalization form: NFC, NFD, NFKC, NFKD

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setLower(value)[source]
Parameters

lower (bool) – Lowercase text

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(form=None, inputCol=None, lower=None, outputCol=None)[source]

Set the (keyword only) parameters

Parameters
  • form (str) – Unicode normalization form: NFC, NFD, NFKC, NFKD

  • inputCol (str) – The name of the input column

  • lower (bool) – Lowercase text

  • outputCol (str) – The name of the output column

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.