mmlspark.vw package¶

Submodules¶

mmlspark.vw.VowpalWabbitClassifier module¶

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassificationModel(java_model=None)[source]¶

Bases: mmlspark.vw._VowpalWabbitClassifier._VowpalWabbitClassificationModel

saveNativeModel(filename)[source]¶: Save the native model to a local or WASB remote location.

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassifier(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', labelConversion=True, learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]¶: Bases: mmlspark.vw._VowpalWabbitClassifier._VowpalWabbitClassifier

mmlspark.vw.VowpalWabbitFeaturizer module¶

class mmlspark.vw.VowpalWabbitFeaturizer.VowpalWabbitFeaturizer(inputCols=[], numBits=30, outputCol='features', prefixStringsWithColumnName=True, preserveOrderNumBits=0, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)
numBits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column (default: features)
prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)
preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]¶

Returns: The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getNumBits()[source]¶

Returns: Number of bits used to mask (default: 30)
Return type: int

getOutputCol()[source]¶

Returns: The name of the output column (default: features)
Return type: str

getPrefixStringsWithColumnName()[source]¶

Returns: Prefix string features with column name (default: true)
Return type: bool

getPreserveOrderNumBits()[source]¶

Returns: Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)
Return type: int

getSeed()[source]¶

Returns: Hash seed (default: 0)
Return type: int

getStringSplitInputCols()[source]¶

Returns: Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)
Return type: list

getSumCollisions()[source]¶

Returns: Sums collisions if true, otherwise removes them (default: true)
Return type: bool

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCols(value)[source]¶

Parameters: inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)

setNumBits(value)[source]¶

Parameters: numBits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column (default: features)

setParams(inputCols=[], numBits=30, outputCol='features', prefixStringsWithColumnName=True, preserveOrderNumBits=0, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶

Set the (keyword only) parameters

Parameters

inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)
numBits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column (default: features)
prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)
preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setPrefixStringsWithColumnName(value)[source]¶

Parameters: prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)

setPreserveOrderNumBits(value)[source]¶

Parameters: preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)

setSeed(value)[source]¶

Parameters: seed (int) – Hash seed (default: 0)

setStringSplitInputCols(value)[source]¶

Parameters: stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)

setSumCollisions(value)[source]¶

Parameters: sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitInteractions module¶

class mmlspark.vw.VowpalWabbitInteractions.VowpalWabbitInteractions(inputCols=None, numBits=30, outputCol=None, sumCollisions=True)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCols (list) – The names of the input columns
numBits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]¶

Returns: The names of the input columns
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getNumBits()[source]¶

Returns: Number of bits used to mask (default: 30)
Return type: int

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getSumCollisions()[source]¶

Returns: Sums collisions if true, otherwise removes them (default: true)
Return type: bool

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCols(value)[source]¶

Parameters: inputCols (list) – The names of the input columns

setNumBits(value)[source]¶

Parameters: numBits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCols=None, numBits=30, outputCol=None, sumCollisions=True)[source]¶

Set the (keyword only) parameters

Parameters

inputCols (list) – The names of the input columns
numBits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setSumCollisions(value)[source]¶

Parameters: sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitRegressor module¶

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressionModel(java_model=None)[source]¶

Bases: mmlspark.vw._VowpalWabbitRegressor._VowpalWabbitRegressionModel

saveNativeModel(filename)[source]¶: Save the native model to a local or WASB remote location.

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressor(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]¶: Bases: mmlspark.vw._VowpalWabbitRegressor._VowpalWabbitRegressor

Module contents¶

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.