synapse.ml.recommendation package

Submodules

synapse.ml.recommendation.RankingAdapter module

class synapse.ml.recommendation.RankingAdapter.RankingAdapter(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

itemCol (str) – Column of items
k (int) – number of items
labelCol (str) – The name of the label column
minRatingsPerItem (int) – min ratings for items > 0
minRatingsPerUser (int) – min ratings for users > 0
mode (str) – recommendation mode
ratingCol (str) – Column of ratings
recommender (object) – estimator for selection
userCol (str) – Column of users

getItemCol()[source]

Returns: Column of items
Return type: itemCol

static getJavaPackage()[source]: Returns package name String.

getK()[source]

Returns: number of items
Return type: k

getLabelCol()[source]

Returns: The name of the label column
Return type: labelCol

getMinRatingsPerItem()[source]

Returns: min ratings for items > 0
Return type: minRatingsPerItem

getMinRatingsPerUser()[source]

Returns: min ratings for users > 0
Return type: minRatingsPerUser

getMode()[source]

Returns: recommendation mode
Return type: mode

getRatingCol()[source]

Returns: Column of ratings
Return type: ratingCol

getRecommender()[source]

Returns: estimator for selection
Return type: recommender

getUserCol()[source]

Returns: Column of users
Return type: userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')

k = Param(parent='undefined', name='k', doc='number of items')

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')

minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')

minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')

mode = Param(parent='undefined', name='mode', doc='recommendation mode')

ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')

classmethod read()[source]: Returns an MLReader instance for this class.

recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')

setItemCol(value)[source]

Parameters: itemCol – Column of items

setK(value)[source]

Parameters: k – number of items

setLabelCol(value)[source]

Parameters: labelCol – The name of the label column

setMinRatingsPerItem(value)[source]

Parameters: minRatingsPerItem – min ratings for items > 0

setMinRatingsPerUser(value)[source]

Parameters: minRatingsPerUser – min ratings for users > 0

setMode(value)[source]

Parameters: mode – recommendation mode

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]: Set the (keyword only) parameters

setRatingCol(value)[source]

Parameters: ratingCol – Column of ratings

setRecommender(value)[source]

Parameters: recommender – estimator for selection

setUserCol(value)[source]

Parameters: userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingAdapterModel module

class synapse.ml.recommendation.RankingAdapterModel.RankingAdapterModel(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters

itemCol (str) – Column of items
k (int) – number of items
labelCol (str) – The name of the label column
minRatingsPerItem (int) – min ratings for items > 0
minRatingsPerUser (int) – min ratings for users > 0
mode (str) – recommendation mode
ratingCol (str) – Column of ratings
recommender (object) – estimator for selection
recommenderModel (object) – recommenderModel
userCol (str) – Column of users

getItemCol()[source]

Returns: Column of items
Return type: itemCol

static getJavaPackage()[source]: Returns package name String.

getK()[source]

Returns: number of items
Return type: k

getLabelCol()[source]

Returns: The name of the label column
Return type: labelCol

getMinRatingsPerItem()[source]

Returns: min ratings for items > 0
Return type: minRatingsPerItem

getMinRatingsPerUser()[source]

Returns: min ratings for users > 0
Return type: minRatingsPerUser

getMode()[source]

Returns: recommendation mode
Return type: mode

getRatingCol()[source]

Returns: Column of ratings
Return type: ratingCol

getRecommender()[source]

Returns: estimator for selection
Return type: recommender

getRecommenderModel()[source]

Returns: recommenderModel
Return type: recommenderModel

getUserCol()[source]

Returns: Column of users
Return type: userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')

k = Param(parent='undefined', name='k', doc='number of items')

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')

minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')

minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')

mode = Param(parent='undefined', name='mode', doc='recommendation mode')

ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')

classmethod read()[source]: Returns an MLReader instance for this class.

recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')

recommenderModel = Param(parent='undefined', name='recommenderModel', doc='recommenderModel')

setItemCol(value)[source]

Parameters: itemCol – Column of items

setK(value)[source]

Parameters: k – number of items

setLabelCol(value)[source]

Parameters: labelCol – The name of the label column

setMinRatingsPerItem(value)[source]

Parameters: minRatingsPerItem – min ratings for items > 0

setMinRatingsPerUser(value)[source]

Parameters: minRatingsPerUser – min ratings for users > 0

setMode(value)[source]

Parameters: mode – recommendation mode

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]: Set the (keyword only) parameters

setRatingCol(value)[source]

Parameters: ratingCol – Column of ratings

setRecommender(value)[source]

Parameters: recommender – estimator for selection

setRecommenderModel(value)[source]

Parameters: recommenderModel – recommenderModel

setUserCol(value)[source]

Parameters: userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingEvaluator module

class synapse.ml.recommendation.RankingEvaluator.RankingEvaluator(java_obj=None, itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.evaluation.JavaEvaluator

Parameters

itemCol (str) – Column of items
k (int) – number of items
labelCol (str) – label column name
metricName (str) – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)
nItems (long) – number of items
predictionCol (str) – prediction column name
ratingCol (str) – Column of ratings
userCol (str) – Column of users

getItemCol()[source]

Returns: Column of items
Return type: itemCol

static getJavaPackage()[source]: Returns package name String.

getK()[source]

Returns: number of items
Return type: k

getLabelCol()[source]

Returns: label column name
Return type: labelCol

getMetricName()[source]

Returns: metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)
Return type: metricName

getNItems()[source]

Returns: number of items
Return type: nItems

getPredictionCol()[source]

Returns: prediction column name
Return type: predictionCol

getRatingCol()[source]

Returns: Column of ratings
Return type: ratingCol

getUserCol()[source]

Returns: Column of users
Return type: userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')

k = Param(parent='undefined', name='k', doc='number of items')

labelCol = Param(parent='undefined', name='labelCol', doc='label column name')

metricName = Param(parent='undefined', name='metricName', doc='metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)')

nItems = Param(parent='undefined', name='nItems', doc='number of items')

predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')

ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')

classmethod read()[source]: Returns an MLReader instance for this class.

setItemCol(value)[source]

Parameters: itemCol – Column of items

setK(value)[source]

Parameters: k – number of items

setLabelCol(value)[source]

Parameters: labelCol – label column name

setMetricName(value)[source]

Parameters: metricName – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)

setNItems(value)[source]

Parameters: nItems – number of items

setParams(itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]: Set the (keyword only) parameters

setPredictionCol(value)[source]

Parameters: predictionCol – prediction column name

setRatingCol(value)[source]

Parameters: ratingCol – Column of ratings

setUserCol(value)[source]

Parameters: userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingTrainValidationSplit module

class synapse.ml.recommendation.RankingTrainValidationSplit.RankingTrainValidationSplit(**kwargs)[source]: Bases: pyspark.ml.tuning._ValidatorParams, synapse.ml.recommendation._RankingTrainValidationSplit._RankingTrainValidationSplit

synapse.ml.recommendation.RankingTrainValidationSplitModel module

class synapse.ml.recommendation.RankingTrainValidationSplitModel.RankingTrainValidationSplitModel(java_obj=None, bestModel=None, validationMetrics=None)[source]

Bases: synapse.ml.recommendation._RankingTrainValidationSplitModel._RankingTrainValidationSplitModel

recommendForAllItems(numUsers)[source]

recommendForAllUsers(numItems)[source]

synapse.ml.recommendation.RecommendationIndexer module

class synapse.ml.recommendation.RecommendationIndexer.RecommendationIndexer(java_obj=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

itemInputCol (str) – Item Input Col
itemOutputCol (str) – Item Output Col
ratingCol (str) – Rating Col
userInputCol (str) – User Input Col
userOutputCol (str) – User Output Col

getItemInputCol()[source]

Returns: Item Input Col
Return type: itemInputCol

getItemOutputCol()[source]

Returns: Item Output Col
Return type: itemOutputCol

static getJavaPackage()[source]: Returns package name String.

getRatingCol()[source]

Returns: Rating Col
Return type: ratingCol

getUserInputCol()[source]

Returns: User Input Col
Return type: userInputCol

getUserOutputCol()[source]

Returns: User Output Col
Return type: userOutputCol

itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')

itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')

ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')

classmethod read()[source]: Returns an MLReader instance for this class.

setItemInputCol(value)[source]

Parameters: itemInputCol – Item Input Col

setItemOutputCol(value)[source]

Parameters: itemOutputCol – Item Output Col

setParams(itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]: Set the (keyword only) parameters

setRatingCol(value)[source]

Parameters: ratingCol – Rating Col

setUserInputCol(value)[source]

Parameters: userInputCol – User Input Col

setUserOutputCol(value)[source]

Parameters: userOutputCol – User Output Col

userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')

userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')

synapse.ml.recommendation.RecommendationIndexerModel module

class synapse.ml.recommendation.RecommendationIndexerModel.RecommendationIndexerModel(java_obj=None, itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters

itemIndexModel (object) – itemIndexModel
itemInputCol (str) – Item Input Col
itemOutputCol (str) – Item Output Col
ratingCol (str) – Rating Col
userIndexModel (object) – userIndexModel
userInputCol (str) – User Input Col
userOutputCol (str) – User Output Col

getItemIndexModel()[source]

Returns: itemIndexModel
Return type: itemIndexModel

getItemInputCol()[source]

Returns: Item Input Col
Return type: itemInputCol

getItemOutputCol()[source]

Returns: Item Output Col
Return type: itemOutputCol

static getJavaPackage()[source]: Returns package name String.

getRatingCol()[source]

Returns: Rating Col
Return type: ratingCol

getUserIndexModel()[source]

Returns: userIndexModel
Return type: userIndexModel

getUserInputCol()[source]

Returns: User Input Col
Return type: userInputCol

getUserOutputCol()[source]

Returns: User Output Col
Return type: userOutputCol

itemIndexModel = Param(parent='undefined', name='itemIndexModel', doc='itemIndexModel')

itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')

itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')

ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')

classmethod read()[source]: Returns an MLReader instance for this class.

setItemIndexModel(value)[source]

Parameters: itemIndexModel – itemIndexModel

setItemInputCol(value)[source]

Parameters: itemInputCol – Item Input Col

setItemOutputCol(value)[source]

Parameters: itemOutputCol – Item Output Col

setParams(itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]: Set the (keyword only) parameters

setRatingCol(value)[source]

Parameters: ratingCol – Rating Col

setUserIndexModel(value)[source]

Parameters: userIndexModel – userIndexModel

setUserInputCol(value)[source]

Parameters: userInputCol – User Input Col

setUserOutputCol(value)[source]

Parameters: userOutputCol – User Output Col

userIndexModel = Param(parent='undefined', name='userIndexModel', doc='userIndexModel')

userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')

userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')

synapse.ml.recommendation.SAR module

class synapse.ml.recommendation.SAR.SAR(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

activityTimeFormat (str) – Time format for events, default: yyyy/MM/dd’T’h:mm:ss
alpha (float) – alpha for implicit preference
blockSize (int) – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
checkpointInterval (int) – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
coldStartStrategy (str) – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
finalStorageLevel (str) – StorageLevel for ALS model factors.
implicitPrefs (bool) – whether to use implicit preference
intermediateStorageLevel (str) – StorageLevel for intermediate datasets. Cannot be ‘NONE’.
itemCol (str) – column name for item ids. Ids must be within the integer value range.
maxIter (int) – maximum number of iterations (>= 0)
nonnegative (bool) – whether to use nonnegative constraint for least squares
numItemBlocks (int) – number of item blocks
numUserBlocks (int) – number of user blocks
predictionCol (str) – prediction column name
rank (int) – rank of the factorization
ratingCol (str) – column name for ratings
regParam (float) – regularization parameter (>= 0)
seed (long) – random seed
similarityFunction (str) – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
startTime (str) – Set time custom now time if using historical data
startTimeFormat (str) – Format for start time
supportThreshold (int) – Minimum number of ratings per item
timeCol (str) – Time of activity
timeDecayCoeff (int) – Use to scale time decay coeff to different half life dur
userCol (str) – column name for user ids. Ids must be within the integer value range.

activityTimeFormat = Param(parent='undefined', name='activityTimeFormat', doc="Time format for events, default: yyyy/MM/dd'T'h:mm:ss")

alpha = Param(parent='undefined', name='alpha', doc='alpha for implicit preference')

blockSize = Param(parent='undefined', name='blockSize', doc='block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.')

checkpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext')

coldStartStrategy = Param(parent='undefined', name='coldStartStrategy', doc='strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.')

finalStorageLevel = Param(parent='undefined', name='finalStorageLevel', doc='StorageLevel for ALS model factors.')

getActivityTimeFormat()[source]

Returns: Time format for events, default: yyyy/MM/dd’T’h:mm:ss
Return type: activityTimeFormat

getAlpha()[source]

Returns: alpha for implicit preference
Return type: alpha

getBlockSize()[source]

Returns: block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
Return type: blockSize

getCheckpointInterval()[source]

Returns: set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
Return type: checkpointInterval

getColdStartStrategy()[source]

Returns: strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
Return type: coldStartStrategy

getFinalStorageLevel()[source]

Returns: StorageLevel for ALS model factors.
Return type: finalStorageLevel

getImplicitPrefs()[source]

Returns: whether to use implicit preference
Return type: implicitPrefs

getIntermediateStorageLevel()[source]

Returns: StorageLevel for intermediate datasets. Cannot be ‘NONE’.
Return type: intermediateStorageLevel

getItemCol()[source]

Returns: column name for item ids. Ids must be within the integer value range.
Return type: itemCol

static getJavaPackage()[source]: Returns package name String.

getMaxIter()[source]

Returns: maximum number of iterations (>= 0)
Return type: maxIter

getNonnegative()[source]

Returns: whether to use nonnegative constraint for least squares
Return type: nonnegative

getNumItemBlocks()[source]

Returns: number of item blocks
Return type: numItemBlocks

getNumUserBlocks()[source]

Returns: number of user blocks
Return type: numUserBlocks

getPredictionCol()[source]

Returns: prediction column name
Return type: predictionCol

getRank()[source]

Returns: rank of the factorization
Return type: rank

getRatingCol()[source]

Returns: column name for ratings
Return type: ratingCol

getRegParam()[source]

Returns: regularization parameter (>= 0)
Return type: regParam

getSeed()[source]

Returns: random seed
Return type: seed

getSimilarityFunction()[source]

Returns: Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
Return type: similarityFunction

getStartTime()[source]

Returns: Set time custom now time if using historical data
Return type: startTime

getStartTimeFormat()[source]

Returns: Format for start time
Return type: startTimeFormat

getSupportThreshold()[source]

Returns: Minimum number of ratings per item
Return type: supportThreshold

getTimeCol()[source]

Returns: Time of activity
Return type: timeCol

getTimeDecayCoeff()[source]

Returns: Use to scale time decay coeff to different half life dur
Return type: timeDecayCoeff

getUserCol()[source]

Returns: column name for user ids. Ids must be within the integer value range.
Return type: userCol

implicitPrefs = Param(parent='undefined', name='implicitPrefs', doc='whether to use implicit preference')

intermediateStorageLevel = Param(parent='undefined', name='intermediateStorageLevel', doc="StorageLevel for intermediate datasets. Cannot be 'NONE'.")

itemCol = Param(parent='undefined', name='itemCol', doc='column name for item ids. Ids must be within the integer value range.')

maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')

nonnegative = Param(parent='undefined', name='nonnegative', doc='whether to use nonnegative constraint for least squares')

numItemBlocks = Param(parent='undefined', name='numItemBlocks', doc='number of item blocks')

numUserBlocks = Param(parent='undefined', name='numUserBlocks', doc='number of user blocks')

predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')

rank = Param(parent='undefined', name='rank', doc='rank of the factorization')

ratingCol = Param(parent='undefined', name='ratingCol', doc='column name for ratings')

classmethod read()[source]: Returns an MLReader instance for this class.

regParam = Param(parent='undefined', name='regParam', doc='regularization parameter (>= 0)')

seed = Param(parent='undefined', name='seed', doc='random seed')

setActivityTimeFormat(value)[source]

Parameters: activityTimeFormat – Time format for events, default: yyyy/MM/dd’T’h:mm:ss

setAlpha(value)[source]

Parameters: alpha – alpha for implicit preference

setBlockSize(value)[source]

Parameters: blockSize – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.

setCheckpointInterval(value)[source]

Parameters: checkpointInterval – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext

setColdStartStrategy(value)[source]

Parameters: coldStartStrategy – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.

setFinalStorageLevel(value)[source]

Parameters: finalStorageLevel – StorageLevel for ALS model factors.

setImplicitPrefs(value)[source]

Parameters: implicitPrefs – whether to use implicit preference

setIntermediateStorageLevel(value)[source]

Parameters: intermediateStorageLevel – StorageLevel for intermediate datasets. Cannot be ‘NONE’.

setItemCol(value)[source]

Parameters: itemCol – column name for item ids. Ids must be within the integer value range.

setMaxIter(value)[source]

Parameters: maxIter – maximum number of iterations (>= 0)

setNonnegative(value)[source]

Parameters: nonnegative – whether to use nonnegative constraint for least squares

setNumItemBlocks(value)[source]

Parameters: numItemBlocks – number of item blocks

setNumUserBlocks(value)[source]

Parameters: numUserBlocks – number of user blocks

setParams(activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]: Set the (keyword only) parameters

setPredictionCol(value)[source]

Parameters: predictionCol – prediction column name

setRank(value)[source]

Parameters: rank – rank of the factorization

setRatingCol(value)[source]

Parameters: ratingCol – column name for ratings

setRegParam(value)[source]

Parameters: regParam – regularization parameter (>= 0)

setSeed(value)[source]

Parameters: seed – random seed

setSimilarityFunction(value)[source]

Parameters: similarityFunction – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.

setStartTime(value)[source]

Parameters: startTime – Set time custom now time if using historical data

setStartTimeFormat(value)[source]

Parameters: startTimeFormat – Format for start time

setSupportThreshold(value)[source]

Parameters: supportThreshold – Minimum number of ratings per item

setTimeCol(value)[source]

Parameters: timeCol – Time of activity

setTimeDecayCoeff(value)[source]

Parameters: timeDecayCoeff – Use to scale time decay coeff to different half life dur

setUserCol(value)[source]

Parameters: userCol – column name for user ids. Ids must be within the integer value range.

similarityFunction = Param(parent='undefined', name='similarityFunction', doc='Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.')

startTime = Param(parent='undefined', name='startTime', doc='Set time custom now time if using historical data')

startTimeFormat = Param(parent='undefined', name='startTimeFormat', doc='Format for start time')

supportThreshold = Param(parent='undefined', name='supportThreshold', doc='Minimum number of ratings per item')

timeCol = Param(parent='undefined', name='timeCol', doc='Time of activity')

timeDecayCoeff = Param(parent='undefined', name='timeDecayCoeff', doc='Use to scale time decay coeff to different half life dur')

userCol = Param(parent='undefined', name='userCol', doc='column name for user ids. Ids must be within the integer value range.')

synapse.ml.recommendation.SARModel module

class synapse.ml.recommendation.SARModel.SARModel(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', itemDataFrame=None, maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=- 1453370660, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user', userDataFrame=None)[source]

Bases: synapse.ml.recommendation._SARModel._SARModel

recommendForAllUsers(numItems)[source]

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.