synapse.ml.recommendation package

Submodules

synapse.ml.recommendation.RankingAdapter module

class synapse.ml.recommendation.RankingAdapter.RankingAdapter(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items

  • labelCol (str) – The name of the label column

  • minRatingsPerItem (int) – min ratings for items > 0

  • minRatingsPerUser (int) – min ratings for users > 0

  • mode (str) – recommendation mode

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • userCol (str) – Column of users

getItemCol()[source]
Returns

Column of items

Return type

itemCol

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of items

Return type

k

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getMinRatingsPerItem()[source]
Returns

min ratings for items > 0

Return type

minRatingsPerItem

getMinRatingsPerUser()[source]
Returns

min ratings for users > 0

Return type

minRatingsPerUser

getMode()[source]
Returns

recommendation mode

Return type

mode

getRatingCol()[source]
Returns

Column of ratings

Return type

ratingCol

getRecommender()[source]
Returns

estimator for selection

Return type

recommender

getUserCol()[source]
Returns

Column of users

Return type

userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
k = Param(parent='undefined', name='k', doc='number of items')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
mode = Param(parent='undefined', name='mode', doc='recommendation mode')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
setItemCol(value)[source]
Parameters

itemCol – Column of items

setK(value)[source]
Parameters

k – number of items

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setMinRatingsPerItem(value)[source]
Parameters

minRatingsPerItem – min ratings for items > 0

setMinRatingsPerUser(value)[source]
Parameters

minRatingsPerUser – min ratings for users > 0

setMode(value)[source]
Parameters

mode – recommendation mode

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters

ratingCol – Column of ratings

setRecommender(value)[source]
Parameters

recommender – estimator for selection

setUserCol(value)[source]
Parameters

userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingAdapterModel module

class synapse.ml.recommendation.RankingAdapterModel.RankingAdapterModel(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items

  • labelCol (str) – The name of the label column

  • minRatingsPerItem (int) – min ratings for items > 0

  • minRatingsPerUser (int) – min ratings for users > 0

  • mode (str) – recommendation mode

  • ratingCol (str) – Column of ratings

  • recommender (object) – estimator for selection

  • recommenderModel (object) – recommenderModel

  • userCol (str) – Column of users

getItemCol()[source]
Returns

Column of items

Return type

itemCol

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of items

Return type

k

getLabelCol()[source]
Returns

The name of the label column

Return type

labelCol

getMinRatingsPerItem()[source]
Returns

min ratings for items > 0

Return type

minRatingsPerItem

getMinRatingsPerUser()[source]
Returns

min ratings for users > 0

Return type

minRatingsPerUser

getMode()[source]
Returns

recommendation mode

Return type

mode

getRatingCol()[source]
Returns

Column of ratings

Return type

ratingCol

getRecommender()[source]
Returns

estimator for selection

Return type

recommender

getRecommenderModel()[source]
Returns

recommenderModel

Return type

recommenderModel

getUserCol()[source]
Returns

Column of users

Return type

userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
k = Param(parent='undefined', name='k', doc='number of items')
labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
mode = Param(parent='undefined', name='mode', doc='recommendation mode')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
recommenderModel = Param(parent='undefined', name='recommenderModel', doc='recommenderModel')
setItemCol(value)[source]
Parameters

itemCol – Column of items

setK(value)[source]
Parameters

k – number of items

setLabelCol(value)[source]
Parameters

labelCol – The name of the label column

setMinRatingsPerItem(value)[source]
Parameters

minRatingsPerItem – min ratings for items > 0

setMinRatingsPerUser(value)[source]
Parameters

minRatingsPerUser – min ratings for users > 0

setMode(value)[source]
Parameters

mode – recommendation mode

setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters

ratingCol – Column of ratings

setRecommender(value)[source]
Parameters

recommender – estimator for selection

setRecommenderModel(value)[source]
Parameters

recommenderModel – recommenderModel

setUserCol(value)[source]
Parameters

userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingEvaluator module

class synapse.ml.recommendation.RankingEvaluator.RankingEvaluator(java_obj=None, itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.evaluation.JavaEvaluator

Parameters
  • itemCol (str) – Column of items

  • k (int) – number of items

  • labelCol (str) – label column name

  • metricName (str) – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)

  • nItems (long) – number of items

  • predictionCol (str) – prediction column name

  • ratingCol (str) – Column of ratings

  • userCol (str) – Column of users

getItemCol()[source]
Returns

Column of items

Return type

itemCol

static getJavaPackage()[source]

Returns package name String.

getK()[source]
Returns

number of items

Return type

k

getLabelCol()[source]
Returns

label column name

Return type

labelCol

getMetricName()[source]
Returns

metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)

Return type

metricName

getNItems()[source]
Returns

number of items

Return type

nItems

getPredictionCol()[source]
Returns

prediction column name

Return type

predictionCol

getRatingCol()[source]
Returns

Column of ratings

Return type

ratingCol

getUserCol()[source]
Returns

Column of users

Return type

userCol

itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
k = Param(parent='undefined', name='k', doc='number of items')
labelCol = Param(parent='undefined', name='labelCol', doc='label column name')
metricName = Param(parent='undefined', name='metricName', doc='metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)')
nItems = Param(parent='undefined', name='nItems', doc='number of items')
predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

setItemCol(value)[source]
Parameters

itemCol – Column of items

setK(value)[source]
Parameters

k – number of items

setLabelCol(value)[source]
Parameters

labelCol – label column name

setMetricName(value)[source]
Parameters

metricName – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)

setNItems(value)[source]
Parameters

nItems – number of items

setParams(itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]

Set the (keyword only) parameters

setPredictionCol(value)[source]
Parameters

predictionCol – prediction column name

setRatingCol(value)[source]
Parameters

ratingCol – Column of ratings

setUserCol(value)[source]
Parameters

userCol – Column of users

userCol = Param(parent='undefined', name='userCol', doc='Column of users')

synapse.ml.recommendation.RankingTrainValidationSplit module

class synapse.ml.recommendation.RankingTrainValidationSplit.RankingTrainValidationSplit(**kwargs)[source]

Bases: pyspark.ml.tuning._ValidatorParams, synapse.ml.recommendation._RankingTrainValidationSplit._RankingTrainValidationSplit

synapse.ml.recommendation.RankingTrainValidationSplitModel module

class synapse.ml.recommendation.RankingTrainValidationSplitModel.RankingTrainValidationSplitModel(java_obj=None, bestModel=None, validationMetrics=None)[source]

Bases: synapse.ml.recommendation._RankingTrainValidationSplitModel._RankingTrainValidationSplitModel

recommendForAllItems(numUsers)[source]
recommendForAllUsers(numItems)[source]

synapse.ml.recommendation.RecommendationIndexer module

class synapse.ml.recommendation.RecommendationIndexer.RecommendationIndexer(java_obj=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

getItemInputCol()[source]
Returns

Item Input Col

Return type

itemInputCol

getItemOutputCol()[source]
Returns

Item Output Col

Return type

itemOutputCol

static getJavaPackage()[source]

Returns package name String.

getRatingCol()[source]
Returns

Rating Col

Return type

ratingCol

getUserInputCol()[source]
Returns

User Input Col

Return type

userInputCol

getUserOutputCol()[source]
Returns

User Output Col

Return type

userOutputCol

itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
classmethod read()[source]

Returns an MLReader instance for this class.

setItemInputCol(value)[source]
Parameters

itemInputCol – Item Input Col

setItemOutputCol(value)[source]
Parameters

itemOutputCol – Item Output Col

setParams(itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters

ratingCol – Rating Col

setUserInputCol(value)[source]
Parameters

userInputCol – User Input Col

setUserOutputCol(value)[source]
Parameters

userOutputCol – User Output Col

userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')

synapse.ml.recommendation.RecommendationIndexerModel module

class synapse.ml.recommendation.RecommendationIndexerModel.RecommendationIndexerModel(java_obj=None, itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • itemIndexModel (object) – itemIndexModel

  • itemInputCol (str) – Item Input Col

  • itemOutputCol (str) – Item Output Col

  • ratingCol (str) – Rating Col

  • userIndexModel (object) – userIndexModel

  • userInputCol (str) – User Input Col

  • userOutputCol (str) – User Output Col

getItemIndexModel()[source]
Returns

itemIndexModel

Return type

itemIndexModel

getItemInputCol()[source]
Returns

Item Input Col

Return type

itemInputCol

getItemOutputCol()[source]
Returns

Item Output Col

Return type

itemOutputCol

static getJavaPackage()[source]

Returns package name String.

getRatingCol()[source]
Returns

Rating Col

Return type

ratingCol

getUserIndexModel()[source]
Returns

userIndexModel

Return type

userIndexModel

getUserInputCol()[source]
Returns

User Input Col

Return type

userInputCol

getUserOutputCol()[source]
Returns

User Output Col

Return type

userOutputCol

itemIndexModel = Param(parent='undefined', name='itemIndexModel', doc='itemIndexModel')
itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
classmethod read()[source]

Returns an MLReader instance for this class.

setItemIndexModel(value)[source]
Parameters

itemIndexModel – itemIndexModel

setItemInputCol(value)[source]
Parameters

itemInputCol – Item Input Col

setItemOutputCol(value)[source]
Parameters

itemOutputCol – Item Output Col

setParams(itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]

Set the (keyword only) parameters

setRatingCol(value)[source]
Parameters

ratingCol – Rating Col

setUserIndexModel(value)[source]
Parameters

userIndexModel – userIndexModel

setUserInputCol(value)[source]
Parameters

userInputCol – User Input Col

setUserOutputCol(value)[source]
Parameters

userOutputCol – User Output Col

userIndexModel = Param(parent='undefined', name='userIndexModel', doc='userIndexModel')
userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')

synapse.ml.recommendation.SAR module

class synapse.ml.recommendation.SAR.SAR(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • activityTimeFormat (str) – Time format for events, default: yyyy/MM/dd’T’h:mm:ss

  • alpha (float) – alpha for implicit preference

  • blockSize (int) – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.

  • checkpointInterval (int) – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext

  • coldStartStrategy (str) – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.

  • finalStorageLevel (str) – StorageLevel for ALS model factors.

  • implicitPrefs (bool) – whether to use implicit preference

  • intermediateStorageLevel (str) – StorageLevel for intermediate datasets. Cannot be ‘NONE’.

  • itemCol (str) – column name for item ids. Ids must be within the integer value range.

  • maxIter (int) – maximum number of iterations (>= 0)

  • nonnegative (bool) – whether to use nonnegative constraint for least squares

  • numItemBlocks (int) – number of item blocks

  • numUserBlocks (int) – number of user blocks

  • predictionCol (str) – prediction column name

  • rank (int) – rank of the factorization

  • ratingCol (str) – column name for ratings

  • regParam (float) – regularization parameter (>= 0)

  • seed (long) – random seed

  • similarityFunction (str) – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.

  • startTime (str) – Set time custom now time if using historical data

  • startTimeFormat (str) – Format for start time

  • supportThreshold (int) – Minimum number of ratings per item

  • timeCol (str) – Time of activity

  • timeDecayCoeff (int) – Use to scale time decay coeff to different half life dur

  • userCol (str) – column name for user ids. Ids must be within the integer value range.

activityTimeFormat = Param(parent='undefined', name='activityTimeFormat', doc="Time format for events, default: yyyy/MM/dd'T'h:mm:ss")
alpha = Param(parent='undefined', name='alpha', doc='alpha for implicit preference')
blockSize = Param(parent='undefined', name='blockSize', doc='block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.')
checkpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext')
coldStartStrategy = Param(parent='undefined', name='coldStartStrategy', doc='strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.')
finalStorageLevel = Param(parent='undefined', name='finalStorageLevel', doc='StorageLevel for ALS model factors.')
getActivityTimeFormat()[source]
Returns

Time format for events, default: yyyy/MM/dd’T’h:mm:ss

Return type

activityTimeFormat

getAlpha()[source]
Returns

alpha for implicit preference

Return type

alpha

getBlockSize()[source]
Returns

block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.

Return type

blockSize

getCheckpointInterval()[source]
Returns

set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext

Return type

checkpointInterval

getColdStartStrategy()[source]
Returns

strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.

Return type

coldStartStrategy

getFinalStorageLevel()[source]
Returns

StorageLevel for ALS model factors.

Return type

finalStorageLevel

getImplicitPrefs()[source]
Returns

whether to use implicit preference

Return type

implicitPrefs

getIntermediateStorageLevel()[source]
Returns

StorageLevel for intermediate datasets. Cannot be ‘NONE’.

Return type

intermediateStorageLevel

getItemCol()[source]
Returns

column name for item ids. Ids must be within the integer value range.

Return type

itemCol

static getJavaPackage()[source]

Returns package name String.

getMaxIter()[source]
Returns

maximum number of iterations (>= 0)

Return type

maxIter

getNonnegative()[source]
Returns

whether to use nonnegative constraint for least squares

Return type

nonnegative

getNumItemBlocks()[source]
Returns

number of item blocks

Return type

numItemBlocks

getNumUserBlocks()[source]
Returns

number of user blocks

Return type

numUserBlocks

getPredictionCol()[source]
Returns

prediction column name

Return type

predictionCol

getRank()[source]
Returns

rank of the factorization

Return type

rank

getRatingCol()[source]
Returns

column name for ratings

Return type

ratingCol

getRegParam()[source]
Returns

regularization parameter (>= 0)

Return type

regParam

getSeed()[source]
Returns

random seed

Return type

seed

getSimilarityFunction()[source]
Returns

Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.

Return type

similarityFunction

getStartTime()[source]
Returns

Set time custom now time if using historical data

Return type

startTime

getStartTimeFormat()[source]
Returns

Format for start time

Return type

startTimeFormat

getSupportThreshold()[source]
Returns

Minimum number of ratings per item

Return type

supportThreshold

getTimeCol()[source]
Returns

Time of activity

Return type

timeCol

getTimeDecayCoeff()[source]
Returns

Use to scale time decay coeff to different half life dur

Return type

timeDecayCoeff

getUserCol()[source]
Returns

column name for user ids. Ids must be within the integer value range.

Return type

userCol

implicitPrefs = Param(parent='undefined', name='implicitPrefs', doc='whether to use implicit preference')
intermediateStorageLevel = Param(parent='undefined', name='intermediateStorageLevel', doc="StorageLevel for intermediate datasets. Cannot be 'NONE'.")
itemCol = Param(parent='undefined', name='itemCol', doc='column name for item ids. Ids must be within the integer value range.')
maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')
nonnegative = Param(parent='undefined', name='nonnegative', doc='whether to use nonnegative constraint for least squares')
numItemBlocks = Param(parent='undefined', name='numItemBlocks', doc='number of item blocks')
numUserBlocks = Param(parent='undefined', name='numUserBlocks', doc='number of user blocks')
predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
rank = Param(parent='undefined', name='rank', doc='rank of the factorization')
ratingCol = Param(parent='undefined', name='ratingCol', doc='column name for ratings')
classmethod read()[source]

Returns an MLReader instance for this class.

regParam = Param(parent='undefined', name='regParam', doc='regularization parameter (>= 0)')
seed = Param(parent='undefined', name='seed', doc='random seed')
setActivityTimeFormat(value)[source]
Parameters

activityTimeFormat – Time format for events, default: yyyy/MM/dd’T’h:mm:ss

setAlpha(value)[source]
Parameters

alpha – alpha for implicit preference

setBlockSize(value)[source]
Parameters

blockSize – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.

setCheckpointInterval(value)[source]
Parameters

checkpointInterval – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext

setColdStartStrategy(value)[source]
Parameters

coldStartStrategy – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.

setFinalStorageLevel(value)[source]
Parameters

finalStorageLevel – StorageLevel for ALS model factors.

setImplicitPrefs(value)[source]
Parameters

implicitPrefs – whether to use implicit preference

setIntermediateStorageLevel(value)[source]
Parameters

intermediateStorageLevel – StorageLevel for intermediate datasets. Cannot be ‘NONE’.

setItemCol(value)[source]
Parameters

itemCol – column name for item ids. Ids must be within the integer value range.

setMaxIter(value)[source]
Parameters

maxIter – maximum number of iterations (>= 0)

setNonnegative(value)[source]
Parameters

nonnegative – whether to use nonnegative constraint for least squares

setNumItemBlocks(value)[source]
Parameters

numItemBlocks – number of item blocks

setNumUserBlocks(value)[source]
Parameters

numUserBlocks – number of user blocks

setParams(activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]

Set the (keyword only) parameters

setPredictionCol(value)[source]
Parameters

predictionCol – prediction column name

setRank(value)[source]
Parameters

rank – rank of the factorization

setRatingCol(value)[source]
Parameters

ratingCol – column name for ratings

setRegParam(value)[source]
Parameters

regParam – regularization parameter (>= 0)

setSeed(value)[source]
Parameters

seed – random seed

setSimilarityFunction(value)[source]
Parameters

similarityFunction – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.

setStartTime(value)[source]
Parameters

startTime – Set time custom now time if using historical data

setStartTimeFormat(value)[source]
Parameters

startTimeFormat – Format for start time

setSupportThreshold(value)[source]
Parameters

supportThreshold – Minimum number of ratings per item

setTimeCol(value)[source]
Parameters

timeCol – Time of activity

setTimeDecayCoeff(value)[source]
Parameters

timeDecayCoeff – Use to scale time decay coeff to different half life dur

setUserCol(value)[source]
Parameters

userCol – column name for user ids. Ids must be within the integer value range.

similarityFunction = Param(parent='undefined', name='similarityFunction', doc='Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.')
startTime = Param(parent='undefined', name='startTime', doc='Set time custom now time if using historical data')
startTimeFormat = Param(parent='undefined', name='startTimeFormat', doc='Format for start time')
supportThreshold = Param(parent='undefined', name='supportThreshold', doc='Minimum number of ratings per item')
timeCol = Param(parent='undefined', name='timeCol', doc='Time of activity')
timeDecayCoeff = Param(parent='undefined', name='timeDecayCoeff', doc='Use to scale time decay coeff to different half life dur')
userCol = Param(parent='undefined', name='userCol', doc='column name for user ids. Ids must be within the integer value range.')

synapse.ml.recommendation.SARModel module

class synapse.ml.recommendation.SARModel.SARModel(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', itemDataFrame=None, maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=- 1453370660, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user', userDataFrame=None)[source]

Bases: synapse.ml.recommendation._SARModel._SARModel

recommendForAllUsers(numItems)[source]

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.