mmlspark.recommendation package

Submodules

mmlspark.recommendation.RankingAdapter module

class mmlspark.recommendation.RankingAdapter.RankingAdapter(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items (default: 10)

  • labelCol (str) – The name of the label column (default: label)

  • minRatingsPerItem (int) – min ratings for items > 0 (default: 1)

  • minRatingsPerUser (int) – min ratings for users > 0 (default: 1)

  • mode (str) – recommendation mode (default: allUsers)

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • userCol (str) – Column of users

getItemCol()[source]
Returns

Column of items

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of items (default: 10)

Return type

int

getLabelCol()[source]
Returns

The name of the label column (default: label)

Return type

str

getMinRatingsPerItem()[source]
Returns

min ratings for items > 0 (default: 1)

Return type

int

getMinRatingsPerUser()[source]
Returns

min ratings for users > 0 (default: 1)

Return type

int

getMode()[source]
Returns

recommendation mode (default: allUsers)

Return type

str

getRatingCol()[source]
Returns

Column of ratings

Return type

str

getRecommender()[source]
Returns

estimator for selection

Return type

object

getUserCol()[source]
Returns

Column of users

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setItemCol(value)[source]
Parameters

itemCol (str) – Column of items

setK(value)[source]
Parameters

k (int) – number of items (default: 10)

setLabelCol(value)[source]
Parameters

labelCol (str) – The name of the label column (default: label)

setMinRatingsPerItem(value)[source]
Parameters

minRatingsPerItem (int) – min ratings for items > 0 (default: 1)

setMinRatingsPerUser(value)[source]
Parameters

minRatingsPerUser (int) – min ratings for users > 0 (default: 1)

setMode(value)[source]
Parameters

mode (str) – recommendation mode (default: allUsers)

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]

Set the (keyword only) parameters

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items (default: 10)

  • labelCol (str) – The name of the label column (default: label)

  • minRatingsPerItem (int) – min ratings for items > 0 (default: 1)

  • minRatingsPerUser (int) – min ratings for users > 0 (default: 1)

  • mode (str) – recommendation mode (default: allUsers)

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • userCol (str) – Column of users

setRatingCol(value)[source]
Parameters

ratingCol (str) – Column of ratings

setRecommender(value)[source]
Parameters

recommender (object) – estimator for selection

setUserCol(value)[source]
Parameters

userCol (str) – Column of users

class mmlspark.recommendation.RankingAdapter.RankingAdapterModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by RankingAdapter.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

mmlspark.recommendation.RankingAdapterModel module

class mmlspark.recommendation.RankingAdapterModel.RankingAdapterModel(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items (default: 10)

  • labelCol (str) – The name of the label column (default: label)

  • minRatingsPerItem (int) – min ratings for items > 0 (default: 1)

  • minRatingsPerUser (int) – min ratings for users > 0 (default: 1)

  • mode (str) – recommendation mode (default: allUsers)

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • recommenderModel (object) – recommenderModel

  • userCol (str) – Column of users

getItemCol()[source]
Returns

Column of items

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of items (default: 10)

Return type

int

getLabelCol()[source]
Returns

The name of the label column (default: label)

Return type

str

getMinRatingsPerItem()[source]
Returns

min ratings for items > 0 (default: 1)

Return type

int

getMinRatingsPerUser()[source]
Returns

min ratings for users > 0 (default: 1)

Return type

int

getMode()[source]
Returns

recommendation mode (default: allUsers)

Return type

str

getRatingCol()[source]
Returns

Column of ratings

Return type

str

getRecommender()[source]
Returns

estimator for selection

Return type

object

getRecommenderModel()[source]
Returns

recommenderModel

Return type

object

getUserCol()[source]
Returns

Column of users

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setItemCol(value)[source]
Parameters

itemCol (str) – Column of items

setK(value)[source]
Parameters

k (int) – number of items (default: 10)

setLabelCol(value)[source]
Parameters

labelCol (str) – The name of the label column (default: label)

setMinRatingsPerItem(value)[source]
Parameters

minRatingsPerItem (int) – min ratings for items > 0 (default: 1)

setMinRatingsPerUser(value)[source]
Parameters

minRatingsPerUser (int) – min ratings for users > 0 (default: 1)

setMode(value)[source]
Parameters

mode (str) – recommendation mode (default: allUsers)

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]

Set the (keyword only) parameters

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items (default: 10)

  • labelCol (str) – The name of the label column (default: label)

  • minRatingsPerItem (int) – min ratings for items > 0 (default: 1)

  • minRatingsPerUser (int) – min ratings for users > 0 (default: 1)

  • mode (str) – recommendation mode (default: allUsers)

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • recommenderModel (object) – recommenderModel

  • userCol (str) – Column of users

setRatingCol(value)[source]
Parameters

ratingCol (str) – Column of ratings

setRecommender(value)[source]
Parameters

recommender (object) – estimator for selection

setRecommenderModel(value)[source]
Parameters

recommenderModel (object) – recommenderModel

setUserCol(value)[source]
Parameters

userCol (str) – Column of users

mmlspark.recommendation.RankingEvaluator module

class mmlspark.recommendation.RankingEvaluator.RankingEvaluator(itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=-1, predictionCol='prediction', ratingCol=None, userCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.evaluation.JavaEvaluator

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items (default: 10)

  • labelCol (str) – label column name (default: label)

  • metricName (str) – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp) (default: ndcgAt)

  • nItems (long) – number of items (default: -1)

  • predictionCol (str) – prediction column name (default: prediction)

  • ratingCol (str) – Column of ratings

  • userCol (str) – Column of users

getItemCol()[source]
Returns

Column of items

Return type

str

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of items (default: 10)

Return type

int

getLabelCol()[source]
Returns

label column name (default: label)

Return type

str

getMetricName()[source]
Returns

metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp) (default: ndcgAt)

Return type

str

getNItems()[source]
Returns

number of items (default: -1)

Return type

long

getPredictionCol()[source]
Returns

prediction column name (default: prediction)

Return type

str

getRatingCol()[source]
Returns

Column of ratings

Return type

str

getUserCol()[source]
Returns

Column of users

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setItemCol(value)[source]
Parameters

itemCol (str) – Column of items

setK(value)[source]
Parameters

k (int) – number of items (default: 10)

setLabelCol(value)[source]
Parameters

labelCol (str) – label column name (default: label)

setMetricName(value)[source]
Parameters

metricName (str) – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp) (default: ndcgAt)

setNItems(value)[source]
Parameters

nItems (long) – number of items (default: -1)

setParams(itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=-1, predictionCol='prediction', ratingCol=None, userCol=None)[source]

Set the (keyword only) parameters

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items (default: 10)

  • labelCol (str) – label column name (default: label)

  • metricName (str) – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp) (default: ndcgAt)

  • nItems (long) – number of items (default: -1)

  • predictionCol (str) – prediction column name (default: prediction)

  • ratingCol (str) – Column of ratings

  • userCol (str) – Column of users

setPredictionCol(value)[source]
Parameters

predictionCol (str) – prediction column name (default: prediction)

setRatingCol(value)[source]
Parameters

ratingCol (str) – Column of ratings

setUserCol(value)[source]
Parameters

userCol (str) – Column of users

mmlspark.recommendation.RankingTrainValidationSplit module

class mmlspark.recommendation.RankingTrainValidationSplit.RankingTrainValidationSplit(estimator=None, estimatorParamMaps=None, evaluator=None, seed=None)[source]

Bases: pyspark.ml.base.Estimator, pyspark.ml.tuning.ValidatorParams

copy(extra=None)[source]

Creates a copy of this instance with a randomly generated uid and some extra com.microsoft.ml.spark.core.serialize.params. This copies creates a deep copy of the embedded paramMap, and copies the embedded and extra parameters over.

Parameters

extra – Extra parameters to copy to the new instance

Returns

Copy of this instance

getItemCol()[source]
Returns

column name for item ids. Ids must be within the integer value range. (default: item)

Return type

str

getRatingCol()[source]
Returns

column name for ratings (default: rating)

Return type

str

getTrainRatio()[source]

Gets the value of trainRatio or its default value.

getUserCol()[source]
Returns

column name for user ids. Ids must be within the integer value range. (default: user)

Return type

str

itemCol = Param(parent='undefined', name='itemCol', doc='itemCol: column name for item ids. Ids must be within the integer value range. (default: item)')
ratingCol = Param(parent='undefined', name='ratingCol', doc='ratingCol: column name for ratings (default: rating)')
setItemCol(value)[source]
Parameters

itemCol (str) – column name for item ids. Ids must be within the integer value range. (default: item)

setParams(estimator=None, estimatorParamMaps=None, evaluator=None, seed=None)[source]

setParams(self, estimator=None, estimatorParamMaps=None, evaluator=None, numFolds=3, seed=None): Sets com.microsoft.ml.spark.core.serialize.params for cross validator.

setRatingCol(value)[source]
Parameters

ratingCol (str) – column name for ratings (default: rating)

setTrainRatio(value)[source]

Sets the value of trainRatio.

setUserCol(value)[source]
Parameters

userCol (str) – column name for user ids. Ids must be within the integer value range. (default: user)

trainRatio = Param(parent='undefined', name='trainRatio', doc='Param for ratio between train and validation data. Must be between 0 and 1.')
userCol = Param(parent='undefined', name='userCol', doc='userCol: column name for user ids. Ids must be within the integer value range. (default: user)')

mmlspark.recommendation.RankingTrainValidationSplitModel module

class mmlspark.recommendation.RankingTrainValidationSplitModel.RankingTrainValidationSplitModel(bestModel=None, validationMetrics=[])[source]

Bases: mmlspark.recommendation._RankingTrainValidationSplitModel._RankingTrainValidationSplitModel, pyspark.ml.tuning.ValidatorParams

bestModel = None

best model from cross validation

copy(extra=None)[source]

Creates a copy of this instance with a randomly generated uid and some extra com.microsoft.ml.spark.core.serialize.params. This copies the underlying bestModel, creates a deep copy of the embedded paramMap, and copies the embedded and extra parameters over. And, this creates a shallow copy of the validationMetrics.

Parameters

extra – Extra parameters to copy to the new instance

Returns

Copy of this instance

recommendForAllItems(numItems)[source]
recommendForAllUsers(numItems)[source]
validationMetrics = None

evaluated validation metrics

mmlspark.recommendation.RecommendationIndexer module

class mmlspark.recommendation.RecommendationIndexer.RecommendationIndexer(itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

getItemInputCol()[source]
Returns

Item Input Col

Return type

str

getItemOutputCol()[source]
Returns

Item Output Col

Return type

str

static getJavaPackage()[source]

Returns package name String.

getRatingCol()[source]
Returns

Rating Col

Return type

str

getUserInputCol()[source]
Returns

User Input Col

Return type

str

getUserOutputCol()[source]
Returns

User Output Col

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setItemInputCol(value)[source]
Parameters

itemInputCol (str) – Item Input Col

setItemOutputCol(value)[source]
Parameters

itemOutputCol (str) – Item Output Col

setParams(itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]

Set the (keyword only) parameters

Parameters
  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

setRatingCol(value)[source]
Parameters

ratingCol (str) – Rating Col

setUserInputCol(value)[source]
Parameters

userInputCol (str) – User Input Col

setUserOutputCol(value)[source]
Parameters

userOutputCol (str) – User Output Col

class mmlspark.recommendation.RecommendationIndexer.RecommendationIndexerModel(java_model=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by RecommendationIndexer.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

mmlspark.recommendation.RecommendationIndexerModel module

class mmlspark.recommendation.RecommendationIndexerModel.RecommendationIndexerModel(itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • itemIndexModel (object) – itemIndexModel

  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userIndexModel (object) – userIndexModel

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

getItemIndexModel()[source]
Returns

itemIndexModel

Return type

object

getItemInputCol()[source]
Returns

Item Input Col

Return type

str

getItemOutputCol()[source]
Returns

Item Output Col

Return type

str

static getJavaPackage()[source]

Returns package name String.

getRatingCol()[source]
Returns

Rating Col

Return type

str

getUserIndexModel()[source]
Returns

userIndexModel

Return type

object

getUserInputCol()[source]
Returns

User Input Col

Return type

str

getUserOutputCol()[source]
Returns

User Output Col

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setItemIndexModel(value)[source]
Parameters

itemIndexModel (object) – itemIndexModel

setItemInputCol(value)[source]
Parameters

itemInputCol (str) – Item Input Col

setItemOutputCol(value)[source]
Parameters

itemOutputCol (str) – Item Output Col

setParams(itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]

Set the (keyword only) parameters

Parameters
  • itemIndexModel (object) – itemIndexModel

  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userIndexModel (object) – userIndexModel

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

setRatingCol(value)[source]
Parameters

ratingCol (str) – Rating Col

setUserIndexModel(value)[source]
Parameters

userIndexModel (object) – userIndexModel

setUserInputCol(value)[source]
Parameters

userInputCol (str) – User Input Col

setUserOutputCol(value)[source]
Parameters

userOutputCol (str) – User Output Col

mmlspark.recommendation.SAR module

class mmlspark.recommendation.SAR.SAR(activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=-1219638142, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]

Bases: mmlspark.recommendation._SAR._SAR

mmlspark.recommendation.SARModel module

class mmlspark.recommendation.SARModel.SARModel(activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', itemDataFrame=None, maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=-809975865, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user', userDataFrame=None)[source]

Bases: mmlspark.recommendation._SARModel._SARModel

recommendForAllUsers(numItems)[source]

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.