mmlspark.stages package¶
Submodules¶
mmlspark.stages.Cacher module¶
-
class
mmlspark.stages.Cacher.
Cacher
(disable=False)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
disable (bool) – Whether or disable caching (so that you can turn it off during evaluation) (default: false)
-
getDisable
()[source]¶ - Returns
Whether or disable caching (so that you can turn it off during evaluation) (default: false)
- Return type
mmlspark.stages.ClassBalancer module¶
-
class
mmlspark.stages.ClassBalancer.
ClassBalancer
(broadcastJoin=True, inputCol=None, outputCol='weight')[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
-
getBroadcastJoin
()[source]¶ - Returns
Whether to broadcast the class to weight mapping to the worker (default: true)
- Return type
-
setBroadcastJoin
(value)[source]¶ - Parameters
broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker (default: true)
-
setOutputCol
(value)[source]¶ - Parameters
outputCol (str) – The name of the output column (default: weight)
-
class
mmlspark.stages.ClassBalancer.
ClassBalancerModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
ClassBalancer
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
mmlspark.stages.DropColumns module¶
-
class
mmlspark.stages.DropColumns.
DropColumns
(cols=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
cols (list) – Comma separated list of column names
mmlspark.stages.DynamicMiniBatchTransformer module¶
-
class
mmlspark.stages.DynamicMiniBatchTransformer.
DynamicMiniBatchTransformer
(maxBatchSize=2147483647)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
maxBatchSize (int) – The max size of the buffer (default: 2147483647)
mmlspark.stages.EnsembleByKey module¶
-
class
mmlspark.stages.EnsembleByKey.
EnsembleByKey
(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
colNames (list) – Names of the result of each col
collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
cols (list) – Cols to ensemble
keys (list) – Keys to group by
strategy (str) – How to ensemble the scores, ex: mean (default: mean)
vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization
-
getCollapseGroup
()[source]¶ - Returns
Whether to collapse all items in group to one entry (default: true)
- Return type
-
getVectorDims
()[source]¶ - Returns
the dimensions of any vector columns, used to avoid materialization
- Return type
-
setCollapseGroup
(value)[source]¶ - Parameters
collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
-
setParams
(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶ Set the (keyword only) parameters
- Parameters
colNames (list) – Names of the result of each col
collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
cols (list) – Cols to ensemble
keys (list) – Keys to group by
strategy (str) – How to ensemble the scores, ex: mean (default: mean)
vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization
mmlspark.stages.Explode module¶
-
class
mmlspark.stages.Explode.
Explode
(inputCol=None, outputCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getOutputCol
()[source]¶ - Returns
The name of the output column (default: [self.uid]_output)
- Return type
mmlspark.stages.FixedMiniBatchTransformer module¶
-
class
mmlspark.stages.FixedMiniBatchTransformer.
FixedMiniBatchTransformer
(batchSize=None, buffered=False, maxBufferSize=2147483647)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getBuffered
()[source]¶ - Returns
Whether or not to buffer batches in memory (default: false)
- Return type
-
setBuffered
(value)[source]¶ - Parameters
buffered (bool) – Whether or not to buffer batches in memory (default: false)
-
setMaxBufferSize
(value)[source]¶ - Parameters
maxBufferSize (int) – The max size of the buffer (default: 2147483647)
mmlspark.stages.FlattenBatch module¶
-
class
mmlspark.stages.FlattenBatch.
FlattenBatch
[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
mmlspark.stages.Lambda module¶
-
class
mmlspark.stages.Lambda.
Lambda
(transformFunc=None, transformSchemaFunc=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
mmlspark.stages.MultiColumnAdapter module¶
-
class
mmlspark.stages.MultiColumnAdapter.
MultiColumnAdapter
(baseStage=None, inputCols=None, outputCols=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
-
setBaseStage
(value)[source]¶ - Parameters
baseStage (object) – base pipeline stage to apply to every column
-
class
mmlspark.stages.MultiColumnAdapter.
PipelineModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
MultiColumnAdapter
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
mmlspark.stages.RenameColumn module¶
-
class
mmlspark.stages.RenameColumn.
RenameColumn
(inputCol=None, outputCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
mmlspark.stages.Repartition module¶
-
class
mmlspark.stages.Repartition.
Repartition
(disable=False, n=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getDisable
()[source]¶ - Returns
Whether to disable repartitioning (so that one can turn it off for evaluation) (default: false)
- Return type
mmlspark.stages.SelectColumns module¶
-
class
mmlspark.stages.SelectColumns.
SelectColumns
(cols=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
cols (list) – Comma separated list of selected column names
mmlspark.stages.StratifiedRepartition module¶
-
class
mmlspark.stages.StratifiedRepartition.
StratifiedRepartition
(labelCol=None, mode='mixed', seed=539887434)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getMode
()[source]¶ - Returns
Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)
- Return type
-
setMode
(value)[source]¶ - Parameters
mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic (default: mixed)
mmlspark.stages.SummarizeData module¶
-
class
mmlspark.stages.SummarizeData.
SummarizeData
(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
basic (bool) – Compute basic statistics (default: true)
counts (bool) – Compute count statistics (default: true)
errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
percentiles (bool) – Compute percentiles (default: true)
sample (bool) – Compute sample statistics (default: true)
-
getErrorThreshold
()[source]¶ - Returns
Threshold for quantiles - 0 is exact (default: 0.0)
- Return type
double
-
setErrorThreshold
(value)[source]¶ - Parameters
errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
-
setParams
(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]¶ Set the (keyword only) parameters
- Parameters
basic (bool) – Compute basic statistics (default: true)
counts (bool) – Compute count statistics (default: true)
errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
percentiles (bool) – Compute percentiles (default: true)
sample (bool) – Compute sample statistics (default: true)
mmlspark.stages.TextPreprocessor module¶
-
class
mmlspark.stages.TextPreprocessor.
TextPreprocessor
(inputCol=None, map=None, normFunc=None, outputCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
mmlspark.stages.TimeIntervalMiniBatchTransformer module¶
-
class
mmlspark.stages.TimeIntervalMiniBatchTransformer.
TimeIntervalMiniBatchTransformer
(maxBatchSize=2147483647, millisToWait=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
setMaxBatchSize
(value)[source]¶ - Parameters
maxBatchSize (int) – The max size of the buffer (default: 2147483647)
mmlspark.stages.Timer module¶
-
class
mmlspark.stages.Timer.
Timer
(disableMaterialization=True, logToScala=True, stage=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
-
getDisableMaterialization
()[source]¶ - Returns
Whether to disable timing (so that one can turn it off for evaluation) (default: true)
- Return type
-
getLogToScala
()[source]¶ - Returns
Whether to output the time to the scala console (default: true)
- Return type
-
setDisableMaterialization
(value)[source]¶ - Parameters
disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation) (default: true)
-
setLogToScala
(value)[source]¶ - Parameters
logToScala (bool) – Whether to output the time to the scala console (default: true)
-
class
mmlspark.stages.Timer.
TimerModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
Timer
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
mmlspark.stages.UDFTransformer module¶
-
class
mmlspark.stages.UDFTransformer.
UDFTransformer
(inputCol=None, inputCols=None, outputCol=None, udf=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
mmlspark.stages.UnicodeNormalize module¶
-
class
mmlspark.stages.UnicodeNormalize.
UnicodeNormalize
(form=None, inputCol=None, lower=None, outputCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
Module contents¶
MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.
MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.