mmlspark.vw package¶
Submodules¶
mmlspark.vw.VowpalWabbitClassifier module¶
-
class
mmlspark.vw.VowpalWabbitClassifier.
VowpalWabbitClassificationModel
(java_model=None)[source]¶ Bases:
mmlspark.vw._VowpalWabbitClassifier._VowpalWabbitClassificationModel
-
class
mmlspark.vw.VowpalWabbitClassifier.
VowpalWabbitClassifier
(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', labelConversion=True, learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None, weightCol=None)[source]¶ Bases:
mmlspark.vw._VowpalWabbitClassifier._VowpalWabbitClassifier
mmlspark.vw.VowpalWabbitFeaturizer module¶
-
class
mmlspark.vw.VowpalWabbitFeaturizer.
VowpalWabbitFeaturizer
(inputCols=[], numBits=30, outputCol='features', prefixStringsWithColumnName=True, preserveOrderNumBits=0, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)
numBits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column (default: features)
prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)
preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)
-
getInputCols
()[source]¶ - Returns
The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)
- Return type
-
getPrefixStringsWithColumnName
()[source]¶ - Returns
Prefix string features with column name (default: true)
- Return type
-
getPreserveOrderNumBits
()[source]¶ - Returns
Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)
- Return type
-
getStringSplitInputCols
()[source]¶ - Returns
Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)
- Return type
-
getSumCollisions
()[source]¶ - Returns
Sums collisions if true, otherwise removes them (default: true)
- Return type
-
setInputCols
(value)[source]¶ - Parameters
inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)
-
setOutputCol
(value)[source]¶ - Parameters
outputCol (str) – The name of the output column (default: features)
-
setParams
(inputCols=[], numBits=30, outputCol='features', prefixStringsWithColumnName=True, preserveOrderNumBits=0, seed=0, stringSplitInputCols=[], sumCollisions=True)[source]¶ Set the (keyword only) parameters
- Parameters
inputCols (list) – The names of the input columns (default: [Ljava.lang.String;@5e8e5f44)
numBits (int) – Number of bits used to mask (default: 30)
outputCol (str) – The name of the output column (default: features)
prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)
preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)
seed (int) – Hash seed (default: 0)
stringSplitInputCols (list) – Input cols that should be split at word boundaries (default: [Ljava.lang.String;@4d097470)
sumCollisions (bool) – Sums collisions if true, otherwise removes them (default: true)
-
setPrefixStringsWithColumnName
(value)[source]¶ - Parameters
prefixStringsWithColumnName (bool) – Prefix string features with column name (default: true)
-
setPreserveOrderNumBits
(value)[source]¶ - Parameters
preserveOrderNumBits (int) – Number of bits used to preserve the feature order. This will reduce the hash size. Needs to be large enough to fit count the maximum number of words (default: 0)
mmlspark.vw.VowpalWabbitInteractions module¶
-
class
mmlspark.vw.VowpalWabbitInteractions.
VowpalWabbitInteractions
(inputCols=None, numBits=30, outputCol=None, sumCollisions=True)[source]¶ Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
-
getSumCollisions
()[source]¶ - Returns
Sums collisions if true, otherwise removes them (default: true)
- Return type
mmlspark.vw.VowpalWabbitRegressor module¶
-
class
mmlspark.vw.VowpalWabbitRegressor.
VowpalWabbitRegressionModel
(java_model=None)[source]¶ Bases:
mmlspark.vw._VowpalWabbitRegressor._VowpalWabbitRegressionModel
-
class
mmlspark.vw.VowpalWabbitRegressor.
VowpalWabbitRegressor
(additionalFeatures=[], args='', featuresCol='features', hashSeed=0, ignoreNamespaces=None, initialModel=None, interactions=None, l1=None, l2=None, labelCol='label', learningRate=None, numBits=18, numPasses=1, powerT=None, predictionCol='prediction', weightCol=None)[source]¶ Bases:
mmlspark.vw._VowpalWabbitRegressor._VowpalWabbitRegressor
Module contents¶
MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.
MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.