mmlspark.vw package

Submodules

mmlspark.vw.VowpalWabbitClassifier module

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassificationModel(java_model=None)[source]

Bases: mmlspark.vw._VowpalWabbitClassifier._VowpalWabbitClassificationModel

saveNativeModel(filename)[source]

Save the native model to a local or WASB remote location.

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassifier(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', labelConversion=True, learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]

Bases: mmlspark.vw._VowpalWabbitClassifier._VowpalWabbitClassifier

mmlspark.vw.VowpalWabbitFeaturizer module

class mmlspark.vw.VowpalWabbitFeaturizer.VowpalWabbitFeaturizer(inputCols=[], numBits=30, outputCol='features', prefixStringsWithColumnName=True, preserveOrderNumBits=0, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)

  • numBits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column (default: features)

  • prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)

  • preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)

  • seed (int) – Hash seed (default: 0)

  • stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]
Returns

The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)

Return type

list

static getJavaPackage()[source]

Returns package name String.

getNumBits()[source]
Returns

Number of bits used to mask (default: 30)

Return type

int

getOutputCol()[source]
Returns

The name of the output column (default: features)

Return type

str

getPrefixStringsWithColumnName()[source]
Returns

Prefix string features with column name (default: true)

Return type

bool

getPreserveOrderNumBits()[source]
Returns

Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)

Return type

int

getSeed()[source]
Returns

Hash seed (default: 0)

Return type

int

getStringSplitInputCols()[source]
Returns

Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)

Return type

list

getSumCollisions()[source]
Returns

Sums collisions if true, otherwise removes them (default: true)

Return type

bool

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCols(value)[source]
Parameters

inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)

setNumBits(value)[source]
Parameters

numBits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: features)

setParams(inputCols=[], numBits=30, outputCol='features', prefixStringsWithColumnName=True, preserveOrderNumBits=0, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]

Set the (keyword only) parameters

Parameters
  • inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)

  • numBits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column (default: features)

  • prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)

  • preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)

  • seed (int) – Hash seed (default: 0)

  • stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setPrefixStringsWithColumnName(value)[source]
Parameters

prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)

setPreserveOrderNumBits(value)[source]
Parameters

preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)

setSeed(value)[source]
Parameters

seed (int) – Hash seed (default: 0)

setStringSplitInputCols(value)[source]
Parameters

stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)

setSumCollisions(value)[source]
Parameters

sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitInteractions module

class mmlspark.vw.VowpalWabbitInteractions.VowpalWabbitInteractions(inputCols=None, numBits=30, outputCol=None, sumCollisions=True)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCols (list) – The names of the input columns

  • numBits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]
Returns

The names of the input columns

Return type

list

static getJavaPackage()[source]

Returns package name String.

getNumBits()[source]
Returns

Number of bits used to mask (default: 30)

Return type

int

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getSumCollisions()[source]
Returns

Sums collisions if true, otherwise removes them (default: true)

Return type

bool

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCols(value)[source]
Parameters

inputCols (list) – The names of the input columns

setNumBits(value)[source]
Parameters

numBits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCols=None, numBits=30, outputCol=None, sumCollisions=True)[source]

Set the (keyword only) parameters

Parameters
  • inputCols (list) – The names of the input columns

  • numBits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setSumCollisions(value)[source]
Parameters

sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitRegressor module

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressionModel(java_model=None)[source]

Bases: mmlspark.vw._VowpalWabbitRegressor._VowpalWabbitRegressionModel

saveNativeModel(filename)[source]

Save the native model to a local or WASB remote location.

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressor(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]

Bases: mmlspark.vw._VowpalWabbitRegressor._VowpalWabbitRegressor

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.