mmlspark.featurize package¶
Subpackages¶
Submodules¶
mmlspark.featurize.AssembleFeatures module¶
-
class
mmlspark.featurize.AssembleFeatures.
AssembleFeatures
(allowImages=False, columnsToFeaturize=None, featuresCol='features', numberOfFeatures=None, oneHotEncodeCategoricals=True)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
allowImages (bool) – Allow featurization of images (default: false)
columnsToFeaturize (list) – Columns to featurize
featuresCol (str) – The name of the features column (default: features)
numberOfFeatures (int) – Number of features to hash string columns to
oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
-
getOneHotEncodeCategoricals
()[source]¶ - Returns
One-hot encode categoricals (default: true)
- Return type
-
setAllowImages
(value)[source]¶ - Parameters
allowImages (bool) – Allow featurization of images (default: false)
-
setFeaturesCol
(value)[source]¶ - Parameters
featuresCol (str) – The name of the features column (default: features)
-
setNumberOfFeatures
(value)[source]¶ - Parameters
numberOfFeatures (int) – Number of features to hash string columns to
-
setOneHotEncodeCategoricals
(value)[source]¶ - Parameters
oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
-
setParams
(allowImages=False, columnsToFeaturize=None, featuresCol='features', numberOfFeatures=None, oneHotEncodeCategoricals=True)[source]¶ Set the (keyword only) parameters
- Parameters
allowImages (bool) – Allow featurization of images (default: false)
columnsToFeaturize (list) – Columns to featurize
featuresCol (str) – The name of the features column (default: features)
numberOfFeatures (int) – Number of features to hash string columns to
oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
-
class
mmlspark.featurize.AssembleFeatures.
AssembleFeaturesModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
AssembleFeatures
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
mmlspark.featurize.CleanMissingData module¶
-
class
mmlspark.featurize.CleanMissingData.
CleanMissingData
(cleaningMode='Mean', customValue=None, inputCols=None, outputCols=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
-
class
mmlspark.featurize.CleanMissingData.
CleanMissingDataModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
CleanMissingData
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
mmlspark.featurize.DataConversion module¶
-
class
mmlspark.featurize.DataConversion.
DataConversion
(cols=None, convertTo='', dateTimeFormat='yyyy-MM-dd HH:mm:ss')[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getDateTimeFormat
()[source]¶ - Returns
Format for DateTime when making DateTime:String conversions (default: yyyy-MM-dd HH:mm:ss)
- Return type
-
setCols
(value)[source]¶ - Parameters
cols (list) – Comma separated list of columns whose type will be converted
-
setDateTimeFormat
(value)[source]¶ - Parameters
dateTimeFormat (str) – Format for DateTime when making DateTime:String conversions (default: yyyy-MM-dd HH:mm:ss)
mmlspark.featurize.Featurize module¶
-
class
mmlspark.featurize.Featurize.
Featurize
(allowImages=False, featureColumns=None, numberOfFeatures=262144, oneHotEncodeCategoricals=True)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
-
getNumberOfFeatures
()[source]¶ - Returns
Number of features to hash string columns to (default: 262144)
- Return type
-
getOneHotEncodeCategoricals
()[source]¶ - Returns
One-hot encode categoricals (default: true)
- Return type
-
setAllowImages
(value)[source]¶ - Parameters
allowImages (bool) – Allow featurization of images (default: false)
-
setNumberOfFeatures
(value)[source]¶ - Parameters
numberOfFeatures (int) – Number of features to hash string columns to (default: 262144)
-
setOneHotEncodeCategoricals
(value)[source]¶ - Parameters
oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
-
class
mmlspark.featurize.Featurize.
PipelineModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
Featurize
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
mmlspark.featurize.IndexToValue module¶
-
class
mmlspark.featurize.IndexToValue.
IndexToValue
(inputCol=None, outputCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
mmlspark.featurize.ValueIndexer module¶
-
class
mmlspark.featurize.ValueIndexer.
ValueIndexer
(inputCol=None, outputCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
-
class
mmlspark.featurize.ValueIndexer.
ValueIndexerModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
ValueIndexer
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
mmlspark.featurize.ValueIndexerModel module¶
-
class
mmlspark.featurize.ValueIndexerModel.
ValueIndexerModel
(dataType='string', inputCol='input', levels=None, outputCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getDataType
()[source]¶ - Returns
The datatype of the levels as a Json string (default: string)
- Return type
-
getOutputCol
()[source]¶ - Returns
The name of the output column (default: [self.uid]_output)
- Return type
-
setDataType
(value)[source]¶ - Parameters
dataType (str) – The datatype of the levels as a Json string (default: string)
-
setInputCol
(value)[source]¶ - Parameters
inputCol (str) – The name of the input column (default: input)
-
setOutputCol
(value)[source]¶ - Parameters
outputCol (str) – The name of the output column (default: [self.uid]_output)
Module contents¶
MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.
MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.