mmlspark.nn package

Submodules

mmlspark.nn.ConditionalBallTree module

class mmlspark.nn.ConditionalBallTree.ConditionalBallTree(keys, values, labels, leafSize, java_obj=None)[source]

Bases: object

findMaximumInnerProducts(queryPoint, conditioner, k)[source]

Find the best match to the queryPoint given the conditioner and k from self. :param queryPoint: array vector to use to query for NNs :param conditioner: set of labels that will subset or condition the NN query :param k: int representing the maximum number of neighbors to return :return: array of tuples representing the index of the match and its distance

static load(filename)[source]
save(filename)[source]

mmlspark.nn.ConditionalKNN module

class mmlspark.nn.ConditionalKNN.ConditionalKNN(*args, **kwargs)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • conditionerCol (str) – column holding identifiers for features that will be returned when queried (default: conditioner)

  • featuresCol (str) – The name of the features column (default: features)

  • k (int) – number of matches to return (default: 5)

  • labelCol (str) – The name of the label column (default: labels)

  • leafSize (int) – max size of the leaves of the tree (default: 50)

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried (default: values)

getConditionerCol()[source]
Returns

column holding identifiers for features that will be returned when queried (default: conditioner)

Return type

str

getFeaturesCol()[source]
Returns

The name of the features column (default: features)

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of matches to return (default: 5)

Return type

int

getLabelCol()[source]
Returns

The name of the label column (default: labels)

Return type

str

getLeafSize()[source]
Returns

max size of the leaves of the tree (default: 50)

Return type

int

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getValuesCol()[source]
Returns

column holding values for each feature (key) that will be returned when queried (default: values)

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConditionerCol(value)[source]
Parameters

conditionerCol – column holding identifiers for features that will be returned when queried (default: conditioner)

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column (default: features)

setK(value)[source]
Parameters

k – number of matches to return (default: 5)

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column (default: labels)

setLeafSize(value)[source]
Parameters

leafSize – max size of the leaves of the tree (default: 50)

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column (default: [self.uid]_output)

setParams(conditionerCol='conditioner', featuresCol='features', k=5, labelCol='labels', leafSize=50, outputCol=None, valuesCol='values')[source]

Set the (keyword only) parameters

Parameters
  • conditionerCol (str) – column holding identifiers for features that will be returned when queried (default: conditioner)

  • featuresCol (str) – The name of the features column (default: features)

  • k (int) – number of matches to return (default: 5)

  • labelCol (str) – The name of the label column (default: labels)

  • leafSize (int) – max size of the leaves of the tree (default: 50)

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried (default: values)

setValuesCol(value)[source]
Parameters

valuesCol – column holding values for each feature (key) that will be returned when queried (default: values)

class mmlspark.nn.ConditionalKNN.ConditionalKNNModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by ConditionalKNN.

getBallTree()[source]
Returns

the ballTree model used for perfoming queries

Return type

object

getConditionerCol()[source]
Returns

column holding identifiers for features that will be returned when queried

Return type

str

getFeaturesCol()[source]
Returns

The name of the features column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of matches to return

Return type

int

getLabelCol()[source]
Returns

The name of the label column

Return type

str

getLeafSize()[source]
Returns

max size of the leaves of the tree

Return type

int

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getValuesCol()[source]
Returns

column holding values for each feature (key) that will be returned when queried

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setBallTree(value)[source]
Parameters

ballTree – the ballTree model used for perfoming queries

setConditionerCol(value)[source]
Parameters

conditionerCol – column holding identifiers for features that will be returned when queried

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setK(value)[source]
Parameters

k – number of matches to return

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setLeafSize(value)[source]
Parameters

leafSize – max size of the leaves of the tree

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setValuesCol(value)[source]
Parameters

valuesCol – column holding values for each feature (key) that will be returned when queried

mmlspark.nn.ConditionalKNNModel module

class mmlspark.nn.ConditionalKNNModel.ConditionalKNNModel(*args, **kwargs)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • ballTree (object) – the ballTree model used for perfoming queries

  • conditionerCol (str) – column holding identifiers for features that will be returned when queried

  • featuresCol (str) – The name of the features column

  • k (int) – number of matches to return

  • labelCol (str) – The name of the label column

  • leafSize (int) – max size of the leaves of the tree

  • outputCol (str) – The name of the output column

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried

getBallTree()[source]
Returns

the ballTree model used for perfoming queries

Return type

object

getConditionerCol()[source]
Returns

column holding identifiers for features that will be returned when queried

Return type

str

getFeaturesCol()[source]
Returns

The name of the features column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of matches to return

Return type

int

getLabelCol()[source]
Returns

The name of the label column

Return type

str

getLeafSize()[source]
Returns

max size of the leaves of the tree

Return type

int

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getValuesCol()[source]
Returns

column holding values for each feature (key) that will be returned when queried

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setBallTree(value)[source]
Parameters

ballTree – the ballTree model used for perfoming queries

setConditionerCol(value)[source]
Parameters

conditionerCol – column holding identifiers for features that will be returned when queried

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setK(value)[source]
Parameters

k – number of matches to return

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setLeafSize(value)[source]
Parameters

leafSize – max size of the leaves of the tree

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(ballTree=None, conditionerCol=None, featuresCol=None, k=None, labelCol=None, leafSize=None, outputCol=None, valuesCol=None)[source]

Set the (keyword only) parameters

Parameters
  • ballTree (object) – the ballTree model used for perfoming queries

  • conditionerCol (str) – column holding identifiers for features that will be returned when queried

  • featuresCol (str) – The name of the features column

  • k (int) – number of matches to return

  • labelCol (str) – The name of the label column

  • leafSize (int) – max size of the leaves of the tree

  • outputCol (str) – The name of the output column

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried

setValuesCol(value)[source]
Parameters

valuesCol – column holding values for each feature (key) that will be returned when queried

mmlspark.nn.KNN module

class mmlspark.nn.KNN.KNN(*args, **kwargs)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • featuresCol (str) – The name of the features column (default: features)

  • k (int) – number of matches to return (default: 5)

  • leafSize (int) – max size of the leaves of the tree (default: 50)

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried (default: values)

getFeaturesCol()[source]
Returns

The name of the features column (default: features)

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of matches to return (default: 5)

Return type

int

getLeafSize()[source]
Returns

max size of the leaves of the tree (default: 50)

Return type

int

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getValuesCol()[source]
Returns

column holding values for each feature (key) that will be returned when queried (default: values)

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column (default: features)

setK(value)[source]
Parameters

k – number of matches to return (default: 5)

setLeafSize(value)[source]
Parameters

leafSize – max size of the leaves of the tree (default: 50)

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column (default: [self.uid]_output)

setParams(featuresCol='features', k=5, leafSize=50, outputCol=None, valuesCol='values')[source]

Set the (keyword only) parameters

Parameters
  • featuresCol (str) – The name of the features column (default: features)

  • k (int) – number of matches to return (default: 5)

  • leafSize (int) – max size of the leaves of the tree (default: 50)

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried (default: values)

setValuesCol(value)[source]
Parameters

valuesCol – column holding values for each feature (key) that will be returned when queried (default: values)

class mmlspark.nn.KNN.KNNModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by KNN.

getBallTree()[source]
Returns

the ballTree model used for perfoming queries

Return type

object

getFeaturesCol()[source]
Returns

The name of the features column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of matches to return

Return type

int

getLeafSize()[source]
Returns

max size of the leaves of the tree

Return type

int

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getValuesCol()[source]
Returns

column holding values for each feature (key) that will be returned when queried

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setBallTree(value)[source]
Parameters

ballTree – the ballTree model used for perfoming queries

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setK(value)[source]
Parameters

k – number of matches to return

setLeafSize(value)[source]
Parameters

leafSize – max size of the leaves of the tree

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setValuesCol(value)[source]
Parameters

valuesCol – column holding values for each feature (key) that will be returned when queried

mmlspark.nn.KNNModel module

class mmlspark.nn.KNNModel.KNNModel(*args, **kwargs)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • ballTree (object) – the ballTree model used for perfoming queries

  • featuresCol (str) – The name of the features column

  • k (int) – number of matches to return

  • leafSize (int) – max size of the leaves of the tree

  • outputCol (str) – The name of the output column

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried

getBallTree()[source]
Returns

the ballTree model used for perfoming queries

Return type

object

getFeaturesCol()[source]
Returns

The name of the features column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of matches to return

Return type

int

getLeafSize()[source]
Returns

max size of the leaves of the tree

Return type

int

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getValuesCol()[source]
Returns

column holding values for each feature (key) that will be returned when queried

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setBallTree(value)[source]
Parameters

ballTree – the ballTree model used for perfoming queries

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setK(value)[source]
Parameters

k – number of matches to return

setLeafSize(value)[source]
Parameters

leafSize – max size of the leaves of the tree

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(ballTree=None, featuresCol=None, k=None, leafSize=None, outputCol=None, valuesCol=None)[source]

Set the (keyword only) parameters

Parameters
  • ballTree (object) – the ballTree model used for perfoming queries

  • featuresCol (str) – The name of the features column

  • k (int) – number of matches to return

  • leafSize (int) – max size of the leaves of the tree

  • outputCol (str) – The name of the output column

  • valuesCol (str) – column holding values for each feature (key) that will be returned when queried

setValuesCol(value)[source]
Parameters

valuesCol – column holding values for each feature (key) that will be returned when queried

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.