mmlspark.vw package¶

Submodules¶

mmlspark.vw.VowpalWabbitClassifier module¶

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassificationModel(java_model=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by VowpalWabbitClassifier.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

class mmlspark.vw.VowpalWabbitClassifier.VowpalWabbitClassifier(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
weightCol (str) – The name of the weight column

getAdditionalFeatures()[source]¶

Returns: Additional feature columns (default: [Ljava.lang.String;@5ede99af)
Return type: list

getArgs()[source]¶

Returns: VW command line arguments passed (default: )
Return type: str

getFeaturesCol()[source]¶

Returns: features column name (default: features)
Return type: str

getHashSeed()[source]¶

Returns: Seed used for hashing (default: 0)
Return type: int

getIgnoreNamespaces()[source]¶

Returns: Namespaces to be ignored (first letter only)
Return type: str

getInitialModel()[source]¶

Returns: Initial model to start from
Return type: list

getInteractions()[source]¶

Returns: Interaction terms as specified by -q
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getL1()[source]¶

Returns: l_1 lambda
Return type: double

getL2()[source]¶

Returns: l_2 lambda
Return type: double

getLabelCol()[source]¶

Returns: label column name (default: label)
Return type: str

getLearningRate()[source]¶

Returns: Learning rate
Return type: double

getNumBits()[source]¶

Returns: Number of bits used (default: 18)
Return type: int

getNumPasses()[source]¶

Returns: Number of passes over the data (default: 1)
Return type: int

getPowerT()[source]¶

Returns: t power value
Return type: double

getPredictionCol()[source]¶

Returns: prediction column name (default: prediction)
Return type: str

getProbabilityCol()[source]¶

Returns: Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
Return type: str

getRawPredictionCol()[source]¶

Returns: raw prediction (a.k.a. confidence) column name (default: rawPrediction)
Return type: str

getThresholds()[source]¶

Returns: Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
Return type: list

getWeightCol()[source]¶

Returns: The name of the weight column
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setAdditionalFeatures(value)[source]¶

Parameters: additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)

setArgs(value)[source]¶

Parameters: args (str) – VW command line arguments passed (default: )

setFeaturesCol(value)[source]¶

Parameters: featuresCol (str) – features column name (default: features)

setHashSeed(value)[source]¶

Parameters: hashSeed (int) – Seed used for hashing (default: 0)

setIgnoreNamespaces(value)[source]¶

Parameters: ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

setInitialModel(value)[source]¶

Parameters: initialModel (list) – Initial model to start from

setInteractions(value)[source]¶

Parameters: interactions (list) – Interaction terms as specified by -q

setL1(value)[source]¶

Parameters: l1 (double) – l_1 lambda

setL2(value)[source]¶

Parameters: l2 (double) – l_2 lambda

setLabelCol(value)[source]¶

Parameters: labelCol (str) – label column name (default: label)

setLearningRate(value)[source]¶

Parameters: learningRate (double) – Learning rate

setNumBits(value)[source]¶

Parameters: numBits (int) – Number of bits used (default: 18)

setNumPasses(value)[source]¶

Parameters: numPasses (int) – Number of passes over the data (default: 1)

setParams(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]¶

Set the (keyword only) parameters

Parameters

additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
weightCol (str) – The name of the weight column

setPowerT(value)[source]¶

Parameters: powerT (double) – t power value

setPredictionCol(value)[source]¶

Parameters: predictionCol (str) – prediction column name (default: prediction)

setProbabilityCol(value)[source]¶

Parameters: probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)

setRawPredictionCol(value)[source]¶

Parameters: rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)

setThresholds(value)[source]¶

Parameters: thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

setWeightCol(value)[source]¶

Parameters: weightCol (str) – The name of the weight column

mmlspark.vw.VowpalWabbitFeaturizer module¶

class mmlspark.vw.VowpalWabbitFeaturizer.VowpalWabbitFeaturizer(inputCols=[], numbits=30, outputCol=None, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)
numbits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]¶

Returns: The names of the input columns (default: [Ljava.lang.String;@530f0612)
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getNumbits()[source]¶

Returns: Number of bits used to mask (default: 30)
Return type: int

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getSeed()[source]¶

Returns: Hash seed (default: 0)
Return type: int

getStringSplitInputCols()[source]¶

Returns: Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)
Return type: list

getSumCollisions()[source]¶

Returns: Sums collisions if true, otherwise removes them (default: true)
Return type: bool

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCols(value)[source]¶

Parameters: inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)

setNumbits(value)[source]¶

Parameters: numbits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCols=[], numbits=30, outputCol=None, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶

Set the (keyword only) parameters

Parameters

inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)
numbits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setSeed(value)[source]¶

Parameters: seed (int) – Hash seed (default: 0)

setStringSplitInputCols(value)[source]¶

Parameters: stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)

setSumCollisions(value)[source]¶

Parameters: sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitInteractions module¶

class mmlspark.vw.VowpalWabbitInteractions.VowpalWabbitInteractions(inputCols=None, numbits=30, outputCol=None, sumCollisions=True)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCols (list) – The names of the input columns
numbits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

getInputCols()[source]¶

Returns: The names of the input columns
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getNumbits()[source]¶

Returns: Number of bits used to mask (default: 30)
Return type: int

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getSumCollisions()[source]¶

Returns: Sums collisions if true, otherwise removes them (default: true)
Return type: bool

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCols(value)[source]¶

Parameters: inputCols (list) – The names of the input columns

setNumbits(value)[source]¶

Parameters: numbits (int) – Number of bits used to mask (default: 30)

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCols=None, numbits=30, outputCol=None, sumCollisions=True)[source]¶

Set the (keyword only) parameters

Parameters

inputCols (list) – The names of the input columns
numbits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

setSumCollisions(value)[source]¶

Parameters: sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)

mmlspark.vw.VowpalWabbitRegressor module¶

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressor(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
weightCol (str) – The name of the weight column

getAdditionalFeatures()[source]¶

Returns: Additional feature columns (default: [Ljava.lang.String;@27629b46)
Return type: list

getArgs()[source]¶

Returns: VW command line arguments passed (default: )
Return type: str

getFeaturesCol()[source]¶

Returns: features column name (default: features)
Return type: str

getHashSeed()[source]¶

Returns: Seed used for hashing (default: 0)
Return type: int

getIgnoreNamespaces()[source]¶

Returns: Namespaces to be ignored (first letter only)
Return type: str

getInitialModel()[source]¶

Returns: Initial model to start from
Return type: list

getInteractions()[source]¶

Returns: Interaction terms as specified by -q
Return type: list

static getJavaPackage()[source]¶: Returns package name String.

getL1()[source]¶

Returns: l_1 lambda
Return type: double

getL2()[source]¶

Returns: l_2 lambda
Return type: double

getLabelCol()[source]¶

Returns: label column name (default: label)
Return type: str

getLearningRate()[source]¶

Returns: Learning rate
Return type: double

getNumBits()[source]¶

Returns: Number of bits used (default: 18)
Return type: int

getNumPasses()[source]¶

Returns: Number of passes over the data (default: 1)
Return type: int

getPowerT()[source]¶

Returns: t power value
Return type: double

getPredictionCol()[source]¶

Returns: prediction column name (default: prediction)
Return type: str

getWeightCol()[source]¶

Returns: The name of the weight column
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setAdditionalFeatures(value)[source]¶

Parameters: additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)

setArgs(value)[source]¶

Parameters: args (str) – VW command line arguments passed (default: )

setFeaturesCol(value)[source]¶

Parameters: featuresCol (str) – features column name (default: features)

setHashSeed(value)[source]¶

Parameters: hashSeed (int) – Seed used for hashing (default: 0)

setIgnoreNamespaces(value)[source]¶

Parameters: ignoreNamespaces (str) – Namespaces to be ignored (first letter only)

setInitialModel(value)[source]¶

Parameters: initialModel (list) – Initial model to start from

setInteractions(value)[source]¶

Parameters: interactions (list) – Interaction terms as specified by -q

setL1(value)[source]¶

Parameters: l1 (double) – l_1 lambda

setL2(value)[source]¶

Parameters: l2 (double) – l_2 lambda

setLabelCol(value)[source]¶

Parameters: labelCol (str) – label column name (default: label)

setLearningRate(value)[source]¶

Parameters: learningRate (double) – Learning rate

setNumBits(value)[source]¶

Parameters: numBits (int) – Number of bits used (default: 18)

setNumPasses(value)[source]¶

Parameters: numPasses (int) – Number of passes over the data (default: 1)

setParams(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]¶

Set the (keyword only) parameters

Parameters

additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
weightCol (str) – The name of the weight column

setPowerT(value)[source]¶

Parameters: powerT (double) – t power value

setPredictionCol(value)[source]¶

Parameters: predictionCol (str) – prediction column name (default: prediction)

setWeightCol(value)[source]¶

Parameters: weightCol (str) – The name of the weight column

class mmlspark.vw.VowpalWabbitRegressor.VowpalWabbitRegressorModel(java_model=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by VowpalWabbitRegressor.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.

Module contents¶

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.