mmlspark.vw package

Submodules

mmlspark.vw.VowpalWabbitClassifier module

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassificationModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by VowpalWabbitClassifier.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassifier(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)

  • args (str) – VW command line arguments passed (default: )

  • featuresCol (str) – features column name (default: features)

  • hashSeed (int) – Seed used for hashing (default: 0)

  • ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

  • initialModel (list) – Initial model to start from

  • interactions (list) – Interaction terms as specified by -q

  • l1 (double) – l_1 lambda

  • l2 (double) – l_2 lambda

  • labelCol (str) – label column name (default: label)

  • learningRate (double) – Learning rate

  • numBits (int) – Number of bits used (default: 18)

  • numPasses (int) – Number of passes over the data (default: 1)

  • powerT (double) – t power value

  • predictionCol (str) – prediction column name (default: prediction)

  • probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)

  • rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)

  • thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

  • weightCol (str) – The name of the weight column

getAdditionalFeatures()[source]
Returns

Additional feature columns (default: [Ljava.lang.String;@5ede99af)

Return type

list

getArgs()[source]
Returns

VW command line arguments passed (default: )

Return type

str

getFeaturesCol()[source]
Returns

features column name (default: features)

Return type

str

getHashSeed()[source]
Returns

Seed used for hashing (default: 0)

Return type

int

getIgnoreNamespaces()[source]
Returns

Namespaces to be ignored (first letter only)

Return type

str

getInitialModel()[source]
Returns

Initial model to start from

Return type

list

getInteractions()[source]
Returns

Interaction terms as specified by -q

Return type

list

static getJavaPackage()[source]

Returns package name String.

getL1()[source]
Returns

l_1 lambda

Return type

double

getL2()[source]
Returns

l_2 lambda

Return type

double

getLabelCol()[source]
Returns

label column name (default: label)

Return type

str

getLearningRate()[source]
Returns

Learning rate

Return type

double

getNumBits()[source]
Returns

Number of bits used (default: 18)

Return type

int

getNumPasses()[source]
Returns

Number of passes over the data (default: 1)

Return type

int

getPowerT()[source]
Returns

t power value

Return type

double

getPredictionCol()[source]
Returns

prediction column name (default: prediction)

Return type

str

getProbabilityCol()[source]
Returns

Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)

Return type

str

getRawPredictionCol()[source]
Returns

raw prediction (a.k.a. confidence) column name (default: rawPrediction)

Return type

str

getThresholds()[source]
Returns

Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

Return type

list

getWeightCol()[source]
Returns

The name of the weight column

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setAdditionalFeatures(value)[source]
Parameters

additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)

setArgs(value)[source]
Parameters

args (str) – VW command line arguments passed (default: )

setFeaturesCol(value)[source]
Parameters

featuresCol (str) – features column name (default: features)

setHashSeed(value)[source]
Parameters

hashSeed (int) – Seed used for hashing (default: 0)

setIgnoreNamespaces(value)[source]
Parameters

ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

setInitialModel(value)[source]
Parameters

initialModel (list) – Initial model to start from

setInteractions(value)[source]
Parameters

interactions (list) – Interaction terms as specified by -q

setL1(value)[source]
Parameters

l1 (double) – l_1 lambda

setL2(value)[source]
Parameters

l2 (double) – l_2 lambda

setLabelCol(value)[source]
Parameters

labelCol (str) – label column name (default: label)

setLearningRate(value)[source]
Parameters

learningRate (double) – Learning rate

setNumBits(value)[source]
Parameters

numBits (int) – Number of bits used (default: 18)

setNumPasses(value)[source]
Parameters

numPasses (int) – Number of passes over the data (default: 1)

setParams(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]

Set the (keyword only) parameters

Parameters
  • additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)

  • args (str) – VW command line arguments passed (default: )

  • featuresCol (str) – features column name (default: features)

  • hashSeed (int) – Seed used for hashing (default: 0)

  • ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

  • initialModel (list) – Initial model to start from

  • interactions (list) – Interaction terms as specified by -q

  • l1 (double) – l_1 lambda

  • l2 (double) – l_2 lambda

  • labelCol (str) – label column name (default: label)

  • learningRate (double) – Learning rate

  • numBits (int) – Number of bits used (default: 18)

  • numPasses (int) – Number of passes over the data (default: 1)

  • powerT (double) – t power value

  • predictionCol (str) – prediction column name (default: prediction)

  • probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)

  • rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)

  • thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

  • weightCol (str) – The name of the weight column

setPowerT(value)[source]
Parameters

powerT (double) – t power value

setPredictionCol(value)[source]
Parameters

predictionCol (str) – prediction column name (default: prediction)

setProbabilityCol(value)[source]
Parameters

probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)

setRawPredictionCol(value)[source]
Parameters

rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)

setThresholds(value)[source]
Parameters

thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

setWeightCol(value)[source]
Parameters

weightCol (str) – The name of the weight column

mmlspark.vw.VowpalWabbitFeaturizer module

class mmlspark.vw.VowpalWabbitFeaturizer.VowpalWabbitFeaturizer(inputCols=[], numbits=30, outputCol=None, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)

  • numbits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column

  • seed (int) – Hash seed (default: 0)

  • stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]
Returns

The names of the input columns (default: [Ljava.lang.String;@530f0612)

Return type

list

static getJavaPackage()[source]

Returns package name String.

getNumbits()[source]
Returns

Number of bits used to mask (default: 30)

Return type

int

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getSeed()[source]
Returns

Hash seed (default: 0)

Return type

int

getStringSplitInputCols()[source]
Returns

Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)

Return type

list

getSumCollisions()[source]
Returns

Sums collisions if true, otherwise removes them (default: true)

Return type

bool

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCols(value)[source]
Parameters

inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)

setNumbits(value)[source]
Parameters

numbits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCols=[], numbits=30, outputCol=None, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]

Set the (keyword only) parameters

Parameters
  • inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)

  • numbits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column

  • seed (int) – Hash seed (default: 0)

  • stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setSeed(value)[source]
Parameters

seed (int) – Hash seed (default: 0)

setStringSplitInputCols(value)[source]
Parameters

stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)

setSumCollisions(value)[source]
Parameters

sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitInteractions module

class mmlspark.vw.VowpalWabbitInteractions.VowpalWabbitInteractions(inputCols=None, numbits=30, outputCol=None, sumCollisions=True)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCols (list) – The names of the input columns

  • numbits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]
Returns

The names of the input columns

Return type

list

static getJavaPackage()[source]

Returns package name String.

getNumbits()[source]
Returns

Number of bits used to mask (default: 30)

Return type

int

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getSumCollisions()[source]
Returns

Sums collisions if true, otherwise removes them (default: true)

Return type

bool

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCols(value)[source]
Parameters

inputCols (list) – The names of the input columns

setNumbits(value)[source]
Parameters

numbits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCols=None, numbits=30, outputCol=None, sumCollisions=True)[source]

Set the (keyword only) parameters

Parameters
  • inputCols (list) – The names of the input columns

  • numbits (int) – Number of bits used to mask (default: 30)

  • outputCol (str) – The name of the output column

  • sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setSumCollisions(value)[source]
Parameters

sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitRegressor module

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressor(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)

  • args (str) – VW command line arguments passed (default: )

  • featuresCol (str) – features column name (default: features)

  • hashSeed (int) – Seed used for hashing (default: 0)

  • ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

  • initialModel (list) – Initial model to start from

  • interactions (list) – Interaction terms as specified by -q

  • l1 (double) – l_1 lambda

  • l2 (double) – l_2 lambda

  • labelCol (str) – label column name (default: label)

  • learningRate (double) – Learning rate

  • numBits (int) – Number of bits used (default: 18)

  • numPasses (int) – Number of passes over the data (default: 1)

  • powerT (double) – t power value

  • predictionCol (str) – prediction column name (default: prediction)

  • weightCol (str) – The name of the weight column

getAdditionalFeatures()[source]
Returns

Additional feature columns (default: [Ljava.lang.String;@27629b46)

Return type

list

getArgs()[source]
Returns

VW command line arguments passed (default: )

Return type

str

getFeaturesCol()[source]
Returns

features column name (default: features)

Return type

str

getHashSeed()[source]
Returns

Seed used for hashing (default: 0)

Return type

int

getIgnoreNamespaces()[source]
Returns

Namespaces to be ignored (first letter only)

Return type

str

getInitialModel()[source]
Returns

Initial model to start from

Return type

list

getInteractions()[source]
Returns

Interaction terms as specified by -q

Return type

list

static getJavaPackage()[source]

Returns package name String.

getL1()[source]
Returns

l_1 lambda

Return type

double

getL2()[source]
Returns

l_2 lambda

Return type

double

getLabelCol()[source]
Returns

label column name (default: label)

Return type

str

getLearningRate()[source]
Returns

Learning rate

Return type

double

getNumBits()[source]
Returns

Number of bits used (default: 18)

Return type

int

getNumPasses()[source]
Returns

Number of passes over the data (default: 1)

Return type

int

getPowerT()[source]
Returns

t power value

Return type

double

getPredictionCol()[source]
Returns

prediction column name (default: prediction)

Return type

str

getWeightCol()[source]
Returns

The name of the weight column

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setAdditionalFeatures(value)[source]
Parameters

additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)

setArgs(value)[source]
Parameters

args (str) – VW command line arguments passed (default: )

setFeaturesCol(value)[source]
Parameters

featuresCol (str) – features column name (default: features)

setHashSeed(value)[source]
Parameters

hashSeed (int) – Seed used for hashing (default: 0)

setIgnoreNamespaces(value)[source]
Parameters

ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

setInitialModel(value)[source]
Parameters

initialModel (list) – Initial model to start from

setInteractions(value)[source]
Parameters

interactions (list) – Interaction terms as specified by -q

setL1(value)[source]
Parameters

l1 (double) – l_1 lambda

setL2(value)[source]
Parameters

l2 (double) – l_2 lambda

setLabelCol(value)[source]
Parameters

labelCol (str) – label column name (default: label)

setLearningRate(value)[source]
Parameters

learningRate (double) – Learning rate

setNumBits(value)[source]
Parameters

numBits (int) – Number of bits used (default: 18)

setNumPasses(value)[source]
Parameters

numPasses (int) – Number of passes over the data (default: 1)

setParams(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]

Set the (keyword only) parameters

Parameters
  • additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)

  • args (str) – VW command line arguments passed (default: )

  • featuresCol (str) – features column name (default: features)

  • hashSeed (int) – Seed used for hashing (default: 0)

  • ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

  • initialModel (list) – Initial model to start from

  • interactions (list) – Interaction terms as specified by -q

  • l1 (double) – l_1 lambda

  • l2 (double) – l_2 lambda

  • labelCol (str) – label column name (default: label)

  • learningRate (double) – Learning rate

  • numBits (int) – Number of bits used (default: 18)

  • numPasses (int) – Number of passes over the data (default: 1)

  • powerT (double) – t power value

  • predictionCol (str) – prediction column name (default: prediction)

  • weightCol (str) – The name of the weight column

setPowerT(value)[source]
Parameters

powerT (double) – t power value

setPredictionCol(value)[source]
Parameters

predictionCol (str) – prediction column name (default: prediction)

setWeightCol(value)[source]
Parameters

weightCol (str) – The name of the weight column

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressorModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by VowpalWabbitRegressor.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.