mmlspark.vw package¶
Submodules¶
mmlspark.vw.VowpalWabbitClassifier module¶
-
class
mmlspark.vw.VowpalWabbitClassifier.
VowpalWabbitClassificationModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
VowpalWabbitClassifier
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
-
class
mmlspark.vw.VowpalWabbitClassifier.
VowpalWabbitClassifier
(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
weightCol (str) – The name of the weight column
-
getAdditionalFeatures
()[source]¶ - Returns
Additional feature columns (default: [Ljava.lang.String;@5ede99af)
- Return type
-
getProbabilityCol
()[source]¶ - Returns
Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
- Return type
-
getRawPredictionCol
()[source]¶ - Returns
raw prediction (a.k.a. confidence) column name (default: rawPrediction)
- Return type
-
getThresholds
()[source]¶ - Returns
Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
- Return type
-
setAdditionalFeatures
(value)[source]¶ - Parameters
additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)
-
setFeaturesCol
(value)[source]¶ - Parameters
featuresCol (str) – features column name (default: features)
-
setIgnoreNamespaces
(value)[source]¶ - Parameters
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
-
setInteractions
(value)[source]¶ - Parameters
interactions (list) – Interaction terms as specified by -q
-
setNumPasses
(value)[source]¶ - Parameters
numPasses (int) – Number of passes over the data (default: 1)
-
setParams
(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]¶ Set the (keyword only) parameters
- Parameters
additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@5ede99af)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
weightCol (str) – The name of the weight column
-
setPredictionCol
(value)[source]¶ - Parameters
predictionCol (str) – prediction column name (default: prediction)
-
setProbabilityCol
(value)[source]¶ - Parameters
probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
-
setRawPredictionCol
(value)[source]¶ - Parameters
rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
-
setThresholds
(value)[source]¶ - Parameters
thresholds (list) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
mmlspark.vw.VowpalWabbitFeaturizer module¶
-
class
mmlspark.vw.VowpalWabbitFeaturizer.
VowpalWabbitFeaturizer
(inputCols=[], numbits=30, outputCol=None, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)
numbits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)
-
getInputCols
()[source]¶ - Returns
The names of the input columns (default: [Ljava.lang.String;@530f0612)
- Return type
-
getStringSplitInputCols
()[source]¶ - Returns
Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)
- Return type
-
getSumCollisions
()[source]¶ - Returns
Sums collisions if true, otherwise removes them (default: true)
- Return type
-
setInputCols
(value)[source]¶ - Parameters
inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)
-
setParams
(inputCols=[], numbits=30, outputCol=None, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶ Set the (keyword only) parameters
- Parameters
inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@530f0612)
numbits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@378f94ec)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)
mmlspark.vw.VowpalWabbitInteractions module¶
-
class
mmlspark.vw.VowpalWabbitInteractions.
VowpalWabbitInteractions
(inputCols=None, numbits=30, outputCol=None, sumCollisions=True)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getSumCollisions
()[source]¶ - Returns
Sums collisions if true, otherwise removes them (default: true)
- Return type
mmlspark.vw.VowpalWabbitRegressor module¶
-
class
mmlspark.vw.VowpalWabbitRegressor.
VowpalWabbitRegressor
(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
weightCol (str) – The name of the weight column
-
getAdditionalFeatures
()[source]¶ - Returns
Additional feature columns (default: [Ljava.lang.String;@27629b46)
- Return type
-
setAdditionalFeatures
(value)[source]¶ - Parameters
additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)
-
setFeaturesCol
(value)[source]¶ - Parameters
featuresCol (str) – features column name (default: features)
-
setIgnoreNamespaces
(value)[source]¶ - Parameters
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
-
setInteractions
(value)[source]¶ - Parameters
interactions (list) – Interaction terms as specified by -q
-
setNumPasses
(value)[source]¶ - Parameters
numPasses (int) – Number of passes over the data (default: 1)
-
setParams
(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]¶ Set the (keyword only) parameters
- Parameters
additionalFeatures (list) – Additional feature columns (default: [Ljava.lang.String;@27629b46)
args (str) – VW command line arguments passed (default: )
featuresCol (str) – features column name (default: features)
hashSeed (int) – Seed used for hashing (default: 0)
ignoreNamespaces (str) – Namespaces to be ignored (first letter only)
initialModel (list) – Initial model to start from
interactions (list) – Interaction terms as specified by -q
l1 (double) – l_1 lambda
l2 (double) – l_2 lambda
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate
numBits (int) – Number of bits used (default: 18)
numPasses (int) – Number of passes over the data (default: 1)
powerT (double) – t power value
predictionCol (str) – prediction column name (default: prediction)
weightCol (str) – The name of the weight column
-
class
mmlspark.vw.VowpalWabbitRegressor.
VowpalWabbitRegressorModel
(java_model=None)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.wrapper.JavaModel
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.util.JavaMLReadable
Model fitted by
VowpalWabbitRegressor
.This class is left empty on purpose. All necessary methods are exposed through inheritance.
Module contents¶
MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.
MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.