synapse.ml.featurize package

Subpackages

Submodules

synapse.ml.featurize.CleanMissingData module

class synapse.ml.featurize.CleanMissingData.CleanMissingData(java_obj=None, cleaningMode='Mean', customValue=None, inputCols=None, outputCols=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • cleaningMode (str) – Cleaning mode

  • customValue (str) – Custom value for replacement

  • inputCols (list) – The names of the input columns

  • outputCols (list) – The names of the output columns

cleaningMode = Param(parent='undefined', name='cleaningMode', doc='Cleaning mode')
customValue = Param(parent='undefined', name='customValue', doc='Custom value for replacement')
getCleaningMode()[source]
Returns:

Cleaning mode

Return type:

cleaningMode

getCustomValue()[source]
Returns:

Custom value for replacement

Return type:

customValue

getInputCols()[source]
Returns:

The names of the input columns

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getOutputCols()[source]
Returns:

The names of the output columns

Return type:

outputCols

inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
outputCols = Param(parent='undefined', name='outputCols', doc='The names of the output columns')
classmethod read()[source]

Returns an MLReader instance for this class.

setCleaningMode(value)[source]
Parameters:

cleaningMode – Cleaning mode

setCustomValue(value)[source]
Parameters:

customValue – Custom value for replacement

setInputCols(value)[source]
Parameters:

inputCols – The names of the input columns

setOutputCols(value)[source]
Parameters:

outputCols – The names of the output columns

setParams(cleaningMode='Mean', customValue=None, inputCols=None, outputCols=None)[source]

Set the (keyword only) parameters

synapse.ml.featurize.CleanMissingDataModel module

class synapse.ml.featurize.CleanMissingDataModel.CleanMissingDataModel(java_obj=None, colsToFill=None, fillValues=None, inputCols=None, outputCols=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • colsToFill (list) – The columns to fill with

  • fillValues (object) – what to replace in the columns

  • inputCols (list) – The names of the input columns

  • outputCols (list) – The names of the output columns

colsToFill = Param(parent='undefined', name='colsToFill', doc='The columns to fill with')
fillValues = Param(parent='undefined', name='fillValues', doc='what to replace in the columns')
getColsToFill()[source]
Returns:

The columns to fill with

Return type:

colsToFill

getFillValues()[source]
Returns:

what to replace in the columns

Return type:

fillValues

getInputCols()[source]
Returns:

The names of the input columns

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getOutputCols()[source]
Returns:

The names of the output columns

Return type:

outputCols

inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
outputCols = Param(parent='undefined', name='outputCols', doc='The names of the output columns')
classmethod read()[source]

Returns an MLReader instance for this class.

setColsToFill(value)[source]
Parameters:

colsToFill – The columns to fill with

setFillValues(value)[source]
Parameters:

fillValues – what to replace in the columns

setInputCols(value)[source]
Parameters:

inputCols – The names of the input columns

setOutputCols(value)[source]
Parameters:

outputCols – The names of the output columns

setParams(colsToFill=None, fillValues=None, inputCols=None, outputCols=None)[source]

Set the (keyword only) parameters

synapse.ml.featurize.CountSelector module

class synapse.ml.featurize.CountSelector.CountSelector(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.featurize.CountSelectorModel module

class synapse.ml.featurize.CountSelectorModel.CountSelectorModel(java_obj=None, indices=None, inputCol=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • indices (list) – An array of indices to select features from a vector column. There can be no overlap with names.

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getIndices()[source]
Returns:

An array of indices to select features from a vector column. There can be no overlap with names.

Return type:

indices

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

indices = Param(parent='undefined', name='indices', doc='An array of indices to select features from a vector column. There can be no overlap with names.')
inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setIndices(value)[source]
Parameters:

indices – An array of indices to select features from a vector column. There can be no overlap with names.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(indices=None, inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.featurize.DataConversion module

class synapse.ml.featurize.DataConversion.DataConversion(java_obj=None, cols=None, convertTo='', dateTimeFormat='yyyy-MM-dd HH:mm:ss')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • cols (list) – Comma separated list of columns whose type will be converted

  • convertTo (str) – The result type

  • dateTimeFormat (str) – Format for DateTime when making DateTime:String conversions

cols = Param(parent='undefined', name='cols', doc='Comma separated list of columns whose type will be converted')
convertTo = Param(parent='undefined', name='convertTo', doc='The result type')
dateTimeFormat = Param(parent='undefined', name='dateTimeFormat', doc='Format for DateTime when making DateTime:String conversions')
getCols()[source]
Returns:

Comma separated list of columns whose type will be converted

Return type:

cols

getConvertTo()[source]
Returns:

The result type

Return type:

convertTo

getDateTimeFormat()[source]
Returns:

Format for DateTime when making DateTime:String conversions

Return type:

dateTimeFormat

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setCols(value)[source]
Parameters:

cols – Comma separated list of columns whose type will be converted

setConvertTo(value)[source]
Parameters:

convertTo – The result type

setDateTimeFormat(value)[source]
Parameters:

dateTimeFormat – Format for DateTime when making DateTime:String conversions

setParams(cols=None, convertTo='', dateTimeFormat='yyyy-MM-dd HH:mm:ss')[source]

Set the (keyword only) parameters

synapse.ml.featurize.Featurize module

class synapse.ml.featurize.Featurize.Featurize(java_obj=None, imputeMissing=True, inputCols=None, numFeatures=262144, oneHotEncodeCategoricals=True, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • imputeMissing (bool) – Whether to impute missing values

  • inputCols (list) – The names of the input columns

  • numFeatures (int) – Number of features to hash string columns to

  • oneHotEncodeCategoricals (bool) – One-hot encode categorical columns

  • outputCol (str) – The name of the output column

getImputeMissing()[source]
Returns:

Whether to impute missing values

Return type:

imputeMissing

getInputCols()[source]
Returns:

The names of the input columns

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getNumFeatures()[source]
Returns:

Number of features to hash string columns to

Return type:

numFeatures

getOneHotEncodeCategoricals()[source]
Returns:

One-hot encode categorical columns

Return type:

oneHotEncodeCategoricals

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

imputeMissing = Param(parent='undefined', name='imputeMissing', doc='Whether to impute missing values')
inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
numFeatures = Param(parent='undefined', name='numFeatures', doc='Number of features to hash string columns to')
oneHotEncodeCategoricals = Param(parent='undefined', name='oneHotEncodeCategoricals', doc='One-hot encode categorical columns')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setImputeMissing(value)[source]
Parameters:

imputeMissing – Whether to impute missing values

setInputCols(value)[source]
Parameters:

inputCols – The names of the input columns

setNumFeatures(value)[source]
Parameters:

numFeatures – Number of features to hash string columns to

setOneHotEncodeCategoricals(value)[source]
Parameters:

oneHotEncodeCategoricals – One-hot encode categorical columns

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(imputeMissing=True, inputCols=None, numFeatures=262144, oneHotEncodeCategoricals=True, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.featurize.IndexToValue module

class synapse.ml.featurize.IndexToValue.IndexToValue(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.featurize.ValueIndexer module

class synapse.ml.featurize.ValueIndexer.ValueIndexer(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.featurize.ValueIndexerModel module

class synapse.ml.featurize.ValueIndexerModel.ValueIndexerModel(java_obj=None, dataType='string', inputCol='input', levels=None, outputCol='ValueIndexerModel_21a1e69d94c3_output')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • dataType (str) – The datatype of the levels as a Json string

  • inputCol (str) – The name of the input column

  • levels (object) – Levels in categorical array

  • outputCol (str) – The name of the output column

dataType = Param(parent='undefined', name='dataType', doc='The datatype of the levels as a Json string')
getDataType()[source]
Returns:

The datatype of the levels as a Json string

Return type:

dataType

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getLevels()[source]
Returns:

Levels in categorical array

Return type:

levels

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
levels = Param(parent='undefined', name='levels', doc='Levels in categorical array')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setDataType(value)[source]
Parameters:

dataType – The datatype of the levels as a Json string

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setLevels(value)[source]
Parameters:

levels – Levels in categorical array

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(dataType='string', inputCol='input', levels=None, outputCol='ValueIndexerModel_21a1e69d94c3_output')[source]

Set the (keyword only) parameters

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.