synapse.ml.recommendation package

Submodules

synapse.ml.recommendation.RankingAdapter module

class synapse.ml.recommendation.RankingAdapter.RankingAdapter(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • itemCol (str) – Column of items

  • k (int) – number of items

  • labelCol (str) – The name of the label column

  • minRatingsPerItem (int) – min ratings for items > 0

  • minRatingsPerUser (int) – min ratings for users > 0

  • mode (str) – recommendation mode

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • userCol (str) – Column of users

getItemCol()[source]
Returns:

Column of items

Return type:

itemCol

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns:

number of items

Return type:

k

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getMinRatingsPerItem()[source]
Returns:

min ratings for items > 0

Return type:

minRatingsPerItem

getMinRatingsPerUser()[source]
Returns:

min ratings for users > 0

Return type:

minRatingsPerUser

getMode()[source]
Returns:

recommendation mode

Return type:

mode

getRatingCol()[source]
Returns:

Column of ratings

Return type:

ratingCol

getRecommender()[source]
Returns:

estimator for selection

Return type:

recommender

getUserCol()[source]
Returns:

Column of users

Return type:

userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
k = Param(parent='undefined', name='k', doc='number of items')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
mode = Param(parent='undefined', name='mode', doc='recommendation mode')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
setItemCol(value)[source]
Parameters:

itemCol – Column of items

setK(value)[source]
Parameters:

k – number of items

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setMinRatingsPerItem(value)[source]
Parameters:

minRatingsPerItem – min ratings for items > 0

setMinRatingsPerUser(value)[source]
Parameters:

minRatingsPerUser – min ratings for users > 0

setMode(value)[source]
Parameters:

mode – recommendation mode

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters:

ratingCol – Column of ratings

setRecommender(value)[source]
Parameters:

recommender – estimator for selection

setUserCol(value)[source]
Parameters:

userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingAdapterModel module

class synapse.ml.recommendation.RankingAdapterModel.RankingAdapterModel(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • itemCol (str) – Column of items

  • k (int) – number of items

  • labelCol (str) – The name of the label column

  • minRatingsPerItem (int) – min ratings for items > 0

  • minRatingsPerUser (int) – min ratings for users > 0

  • mode (str) – recommendation mode

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • recommenderModel (object) – recommenderModel

  • userCol (str) – Column of users

getItemCol()[source]
Returns:

Column of items

Return type:

itemCol

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns:

number of items

Return type:

k

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getMinRatingsPerItem()[source]
Returns:

min ratings for items > 0

Return type:

minRatingsPerItem

getMinRatingsPerUser()[source]
Returns:

min ratings for users > 0

Return type:

minRatingsPerUser

getMode()[source]
Returns:

recommendation mode

Return type:

mode

getRatingCol()[source]
Returns:

Column of ratings

Return type:

ratingCol

getRecommender()[source]
Returns:

estimator for selection

Return type:

recommender

getRecommenderModel()[source]
Returns:

recommenderModel

Return type:

recommenderModel

getUserCol()[source]
Returns:

Column of users

Return type:

userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
k = Param(parent='undefined', name='k', doc='number of items')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
mode = Param(parent='undefined', name='mode', doc='recommendation mode')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
recommenderModel = Param(parent='undefined', name='recommenderModel', doc='recommenderModel')
setItemCol(value)[source]
Parameters:

itemCol – Column of items

setK(value)[source]
Parameters:

k – number of items

setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setMinRatingsPerItem(value)[source]
Parameters:

minRatingsPerItem – min ratings for items > 0

setMinRatingsPerUser(value)[source]
Parameters:

minRatingsPerUser – min ratings for users > 0

setMode(value)[source]
Parameters:

mode – recommendation mode

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters:

ratingCol – Column of ratings

setRecommender(value)[source]
Parameters:

recommender – estimator for selection

setRecommenderModel(value)[source]
Parameters:

recommenderModel – recommenderModel

setUserCol(value)[source]
Parameters:

userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingEvaluator module

class synapse.ml.recommendation.RankingEvaluator.RankingEvaluator(java_obj=None, itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEvaluator

Parameters:
  • itemCol (str) – Column of items

  • k (int) – number of items

  • labelCol (str) – label column name

  • metricName (str) – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)

  • nItems (long) – number of items

  • predictionCol (str) – prediction column name

  • ratingCol (str) – Column of ratings

  • userCol (str) – Column of users

getItemCol()[source]
Returns:

Column of items

Return type:

itemCol

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns:

number of items

Return type:

k

getLabelCol()[source]
Returns:

label column name

Return type:

labelCol

getMetricName()[source]
Returns:

metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)

Return type:

metricName

getNItems()[source]
Returns:

number of items

Return type:

nItems

getPredictionCol()[source]
Returns:

prediction column name

Return type:

predictionCol

getRatingCol()[source]
Returns:

Column of ratings

Return type:

ratingCol

getUserCol()[source]
Returns:

Column of users

Return type:

userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
k = Param(parent='undefined', name='k', doc='number of items')
labelCol = Param(parent='undefined', name='labelCol', doc='label column name')
metricName = Param(parent='undefined', name='metricName', doc='metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)')
nItems = Param(parent='undefined', name='nItems', doc='number of items')
predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

setItemCol(value)[source]
Parameters:

itemCol – Column of items

setK(value)[source]
Parameters:

k – number of items

setLabelCol(value)[source]
Parameters:

labelCol – label column name

setMetricName(value)[source]
Parameters:

metricName – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)

setNItems(value)[source]
Parameters:

nItems – number of items

setParams(itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]

Set the (keyword only) parameters

setPredictionCol(value)[source]
Parameters:

predictionCol – prediction column name

setRatingCol(value)[source]
Parameters:

ratingCol – Column of ratings

setUserCol(value)[source]
Parameters:

userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingTrainValidationSplit module

class synapse.ml.recommendation.RankingTrainValidationSplit.RankingTrainValidationSplit(**kwargs)[source]

Bases: _ValidatorParams, _RankingTrainValidationSplit

synapse.ml.recommendation.RankingTrainValidationSplitModel module

class synapse.ml.recommendation.RankingTrainValidationSplitModel.RankingTrainValidationSplitModel(java_obj=None, bestModel=None, validationMetrics=None)[source]

Bases: _RankingTrainValidationSplitModel

recommendForAllItems(numUsers)[source]
recommendForAllUsers(numItems)[source]

synapse.ml.recommendation.RecommendationIndexer module

class synapse.ml.recommendation.RecommendationIndexer.RecommendationIndexer(java_obj=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

getItemInputCol()[source]
Returns:

Item Input Col

Return type:

itemInputCol

getItemOutputCol()[source]
Returns:

Item Output Col

Return type:

itemOutputCol

static getJavaPackage()[source]

Returns package name String.

getRatingCol()[source]
Returns:

Rating Col

Return type:

ratingCol

getUserInputCol()[source]
Returns:

User Input Col

Return type:

userInputCol

getUserOutputCol()[source]
Returns:

User Output Col

Return type:

userOutputCol

itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
classmethod read()[source]

Returns an MLReader instance for this class.

setItemInputCol(value)[source]
Parameters:

itemInputCol – Item Input Col

setItemOutputCol(value)[source]
Parameters:

itemOutputCol – Item Output Col

setParams(itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters:

ratingCol – Rating Col

setUserInputCol(value)[source]
Parameters:

userInputCol – User Input Col

setUserOutputCol(value)[source]
Parameters:

userOutputCol – User Output Col

userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')

synapse.ml.recommendation.RecommendationIndexerModel module

class synapse.ml.recommendation.RecommendationIndexerModel.RecommendationIndexerModel(java_obj=None, itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • itemIndexModel (object) – itemIndexModel

  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userIndexModel (object) – userIndexModel

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

getItemIndexModel()[source]
Returns:

itemIndexModel

Return type:

itemIndexModel

getItemInputCol()[source]
Returns:

Item Input Col

Return type:

itemInputCol

getItemOutputCol()[source]
Returns:

Item Output Col

Return type:

itemOutputCol

static getJavaPackage()[source]

Returns package name String.

getRatingCol()[source]
Returns:

Rating Col

Return type:

ratingCol

getUserIndexModel()[source]
Returns:

userIndexModel

Return type:

userIndexModel

getUserInputCol()[source]
Returns:

User Input Col

Return type:

userInputCol

getUserOutputCol()[source]
Returns:

User Output Col

Return type:

userOutputCol

itemIndexModel = Param(parent='undefined', name='itemIndexModel', doc='itemIndexModel')
itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
classmethod read()[source]

Returns an MLReader instance for this class.

setItemIndexModel(value)[source]
Parameters:

itemIndexModel – itemIndexModel

setItemInputCol(value)[source]
Parameters:

itemInputCol – Item Input Col

setItemOutputCol(value)[source]
Parameters:

itemOutputCol – Item Output Col

setParams(itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters:

ratingCol – Rating Col

setUserIndexModel(value)[source]
Parameters:

userIndexModel – userIndexModel

setUserInputCol(value)[source]
Parameters:

userInputCol – User Input Col

setUserOutputCol(value)[source]
Parameters:

userOutputCol – User Output Col

userIndexModel = Param(parent='undefined', name='userIndexModel', doc='userIndexModel')
userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')

synapse.ml.recommendation.SAR module

class synapse.ml.recommendation.SAR.SAR(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • activityTimeFormat (str) – Time format for events, default: yyyy/MM/dd’T’h:mm:ss

  • alpha (float) – alpha for implicit preference

  • blockSize (int) – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.

  • checkpointInterval (int) – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext

  • coldStartStrategy (str) – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.

  • finalStorageLevel (str) – StorageLevel for ALS model factors.

  • implicitPrefs (bool) – whether to use implicit preference

  • intermediateStorageLevel (str) – StorageLevel for intermediate datasets. Cannot be ‘NONE’.

  • itemCol (str) – column name for item ids. Ids must be within the integer value range.

  • maxIter (int) – maximum number of iterations (>= 0)

  • nonnegative (bool) – whether to use nonnegative constraint for least squares

  • numItemBlocks (int) – number of item blocks

  • numUserBlocks (int) – number of user blocks

  • predictionCol (str) – prediction column name

  • rank (int) – rank of the factorization

  • ratingCol (str) – column name for ratings

  • regParam (float) – regularization parameter (>= 0)

  • seed (long) – random seed

  • similarityFunction (str) – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.

  • startTime (str) – Set time custom now time if using historical data

  • startTimeFormat (str) – Format for start time

  • supportThreshold (int) – Minimum number of ratings per item

  • timeCol (str) – Time of activity

  • timeDecayCoeff (int) – Use to scale time decay coeff to different half life dur

  • userCol (str) – column name for user ids. Ids must be within the integer value range.

activityTimeFormat = Param(parent='undefined', name='activityTimeFormat', doc="Time format for events, default: yyyy/MM/dd'T'h:mm:ss")
alpha = Param(parent='undefined', name='alpha', doc='alpha for implicit preference')
blockSize = Param(parent='undefined', name='blockSize', doc='block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.')
checkpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext')
coldStartStrategy = Param(parent='undefined', name='coldStartStrategy', doc='strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.')
finalStorageLevel = Param(parent='undefined', name='finalStorageLevel', doc='StorageLevel for ALS model factors.')
getActivityTimeFormat()[source]
Returns:

Time format for events, default: yyyy/MM/dd’T’h:mm:ss

Return type:

activityTimeFormat

getAlpha()[source]
Returns:

alpha for implicit preference

Return type:

alpha

getBlockSize()[source]
Returns:

block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.

Return type:

blockSize

getCheckpointInterval()[source]
Returns:

set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext

Return type:

checkpointInterval

getColdStartStrategy()[source]
Returns:

strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.

Return type:

coldStartStrategy

getFinalStorageLevel()[source]
Returns:

StorageLevel for ALS model factors.

Return type:

finalStorageLevel

getImplicitPrefs()[source]
Returns:

whether to use implicit preference

Return type:

implicitPrefs

getIntermediateStorageLevel()[source]
Returns:

StorageLevel for intermediate datasets. Cannot be ‘NONE’.

Return type:

intermediateStorageLevel

getItemCol()[source]
Returns:

column name for item ids. Ids must be within the integer value range.

Return type:

itemCol

static getJavaPackage()[source]

Returns package name String.

getMaxIter()[source]
Returns:

maximum number of iterations (>= 0)

Return type:

maxIter

getNonnegative()[source]
Returns:

whether to use nonnegative constraint for least squares

Return type:

nonnegative

getNumItemBlocks()[source]
Returns:

number of item blocks

Return type:

numItemBlocks

getNumUserBlocks()[source]
Returns:

number of user blocks

Return type:

numUserBlocks

getPredictionCol()[source]
Returns:

prediction column name

Return type:

predictionCol

getRank()[source]
Returns:

rank of the factorization

Return type:

rank

getRatingCol()[source]
Returns:

column name for ratings

Return type:

ratingCol

getRegParam()[source]
Returns:

regularization parameter (>= 0)

Return type:

regParam

getSeed()[source]
Returns:

random seed

Return type:

seed

getSimilarityFunction()[source]
Returns:

Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.

Return type:

similarityFunction

getStartTime()[source]
Returns:

Set time custom now time if using historical data

Return type:

startTime

getStartTimeFormat()[source]
Returns:

Format for start time

Return type:

startTimeFormat

getSupportThreshold()[source]
Returns:

Minimum number of ratings per item

Return type:

supportThreshold

getTimeCol()[source]
Returns:

Time of activity

Return type:

timeCol

getTimeDecayCoeff()[source]
Returns:

Use to scale time decay coeff to different half life dur

Return type:

timeDecayCoeff

getUserCol()[source]
Returns:

column name for user ids. Ids must be within the integer value range.

Return type:

userCol

implicitPrefs = Param(parent='undefined', name='implicitPrefs', doc='whether to use implicit preference')
intermediateStorageLevel = Param(parent='undefined', name='intermediateStorageLevel', doc="StorageLevel for intermediate datasets. Cannot be 'NONE'.")
itemCol = Param(parent='undefined', name='itemCol', doc='column name for item ids. Ids must be within the integer value range.')
maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')
nonnegative = Param(parent='undefined', name='nonnegative', doc='whether to use nonnegative constraint for least squares')
numItemBlocks = Param(parent='undefined', name='numItemBlocks', doc='number of item blocks')
numUserBlocks = Param(parent='undefined', name='numUserBlocks', doc='number of user blocks')
predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
rank = Param(parent='undefined', name='rank', doc='rank of the factorization')
ratingCol = Param(parent='undefined', name='ratingCol', doc='column name for ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

regParam = Param(parent='undefined', name='regParam', doc='regularization parameter (>= 0)')
seed = Param(parent='undefined', name='seed', doc='random seed')
setActivityTimeFormat(value)[source]
Parameters:

activityTimeFormat – Time format for events, default: yyyy/MM/dd’T’h:mm:ss

setAlpha(value)[source]
Parameters:

alpha – alpha for implicit preference

setBlockSize(value)[source]
Parameters:

blockSize – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.

setCheckpointInterval(value)[source]
Parameters:

checkpointInterval – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext

setColdStartStrategy(value)[source]
Parameters:

coldStartStrategy – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.

setFinalStorageLevel(value)[source]
Parameters:

finalStorageLevel – StorageLevel for ALS model factors.

setImplicitPrefs(value)[source]
Parameters:

implicitPrefs – whether to use implicit preference

setIntermediateStorageLevel(value)[source]
Parameters:

intermediateStorageLevel – StorageLevel for intermediate datasets. Cannot be ‘NONE’.

setItemCol(value)[source]
Parameters:

itemCol – column name for item ids. Ids must be within the integer value range.

setMaxIter(value)[source]
Parameters:

maxIter – maximum number of iterations (>= 0)

setNonnegative(value)[source]
Parameters:

nonnegative – whether to use nonnegative constraint for least squares

setNumItemBlocks(value)[source]
Parameters:

numItemBlocks – number of item blocks

setNumUserBlocks(value)[source]
Parameters:

numUserBlocks – number of user blocks

setParams(activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]

Set the (keyword only) parameters

setPredictionCol(value)[source]
Parameters:

predictionCol – prediction column name

setRank(value)[source]
Parameters:

rank – rank of the factorization

setRatingCol(value)[source]
Parameters:

ratingCol – column name for ratings

setRegParam(value)[source]
Parameters:

regParam – regularization parameter (>= 0)

setSeed(value)[source]
Parameters:

seed – random seed

setSimilarityFunction(value)[source]
Parameters:

similarityFunction – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.

setStartTime(value)[source]
Parameters:

startTime – Set time custom now time if using historical data

setStartTimeFormat(value)[source]
Parameters:

startTimeFormat – Format for start time

setSupportThreshold(value)[source]
Parameters:

supportThreshold – Minimum number of ratings per item

setTimeCol(value)[source]
Parameters:

timeCol – Time of activity

setTimeDecayCoeff(value)[source]
Parameters:

timeDecayCoeff – Use to scale time decay coeff to different half life dur

setUserCol(value)[source]
Parameters:

userCol – column name for user ids. Ids must be within the integer value range.

similarityFunction = Param(parent='undefined', name='similarityFunction', doc='Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.')
startTime = Param(parent='undefined', name='startTime', doc='Set time custom now time if using historical data')
startTimeFormat = Param(parent='undefined', name='startTimeFormat', doc='Format for start time')
supportThreshold = Param(parent='undefined', name='supportThreshold', doc='Minimum number of ratings per item')
timeCol = Param(parent='undefined', name='timeCol', doc='Time of activity')
timeDecayCoeff = Param(parent='undefined', name='timeDecayCoeff', doc='Use to scale time decay coeff to different half life dur')
userCol = Param(parent='undefined', name='userCol', doc='column name for user ids. Ids must be within the integer value range.')

synapse.ml.recommendation.SARModel module

class synapse.ml.recommendation.SARModel.SARModel(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', itemDataFrame=None, maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=- 1453370660, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user', userDataFrame=None)[source]

Bases: _SARModel

recommendForAllUsers(numItems)[source]

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.