synapse.ml.recommendation package
Submodules
synapse.ml.recommendation.RankingAdapter module
- class synapse.ml.recommendation.RankingAdapter.RankingAdapter(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
- k = Param(parent='undefined', name='k', doc='number of items')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
- minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
- mode = Param(parent='undefined', name='mode', doc='recommendation mode')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
- recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
- setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]
Set the (keyword only) parameters
- userCol = Param(parent='undefined', name='userCol', doc='Column of users')
synapse.ml.recommendation.RankingAdapterModel module
- class synapse.ml.recommendation.RankingAdapterModel.RankingAdapterModel(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
- k = Param(parent='undefined', name='k', doc='number of items')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
- minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
- mode = Param(parent='undefined', name='mode', doc='recommendation mode')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
- recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
- recommenderModel = Param(parent='undefined', name='recommenderModel', doc='recommenderModel')
- setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]
Set the (keyword only) parameters
- userCol = Param(parent='undefined', name='userCol', doc='Column of users')
synapse.ml.recommendation.RankingEvaluator module
- class synapse.ml.recommendation.RankingEvaluator.RankingEvaluator(java_obj=None, itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- getMetricName()[source]
- Returns
metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)
- Return type
metricName
- itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
- k = Param(parent='undefined', name='k', doc='number of items')
- labelCol = Param(parent='undefined', name='labelCol', doc='label column name')
- metricName = Param(parent='undefined', name='metricName', doc='metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)')
- nItems = Param(parent='undefined', name='nItems', doc='number of items')
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
- setMetricName(value)[source]
- Parameters
metricName¶ – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)
- setParams(itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]
Set the (keyword only) parameters
- userCol = Param(parent='undefined', name='userCol', doc='Column of users')
synapse.ml.recommendation.RankingTrainValidationSplit module
synapse.ml.recommendation.RankingTrainValidationSplitModel module
synapse.ml.recommendation.RecommendationIndexer module
- class synapse.ml.recommendation.RecommendationIndexer.RecommendationIndexer(java_obj=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
- itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
- setParams(itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]
Set the (keyword only) parameters
- userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
- userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')
synapse.ml.recommendation.RecommendationIndexerModel module
- class synapse.ml.recommendation.RecommendationIndexerModel.RecommendationIndexerModel(java_obj=None, itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
- itemIndexModel = Param(parent='undefined', name='itemIndexModel', doc='itemIndexModel')
- itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
- itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
- setParams(itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]
Set the (keyword only) parameters
- userIndexModel = Param(parent='undefined', name='userIndexModel', doc='userIndexModel')
- userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
- userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')
synapse.ml.recommendation.SAR module
- class synapse.ml.recommendation.SAR.SAR(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]- Parameters
activityTimeFormat¶ (str) – Time format for events, default: yyyy/MM/dd’T’h:mm:ss
blockSize¶ (int) – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
checkpointInterval¶ (int) – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
coldStartStrategy¶ (str) – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
finalStorageLevel¶ (str) – StorageLevel for ALS model factors.
intermediateStorageLevel¶ (str) – StorageLevel for intermediate datasets. Cannot be ‘NONE’.
itemCol¶ (str) – column name for item ids. Ids must be within the integer value range.
nonnegative¶ (bool) – whether to use nonnegative constraint for least squares
seed¶ (long) – random seed
similarityFunction¶ (str) – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
startTime¶ (str) – Set time custom now time if using historical data
supportThreshold¶ (int) – Minimum number of ratings per item
timeDecayCoeff¶ (int) – Use to scale time decay coeff to different half life dur
userCol¶ (str) – column name for user ids. Ids must be within the integer value range.
- activityTimeFormat = Param(parent='undefined', name='activityTimeFormat', doc="Time format for events, default: yyyy/MM/dd'T'h:mm:ss")
- alpha = Param(parent='undefined', name='alpha', doc='alpha for implicit preference')
- blockSize = Param(parent='undefined', name='blockSize', doc='block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.')
- checkpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext')
- coldStartStrategy = Param(parent='undefined', name='coldStartStrategy', doc='strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.')
- finalStorageLevel = Param(parent='undefined', name='finalStorageLevel', doc='StorageLevel for ALS model factors.')
- getActivityTimeFormat()[source]
- Returns
Time format for events, default: yyyy/MM/dd’T’h:mm:ss
- Return type
activityTimeFormat
- getBlockSize()[source]
- Returns
block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
- Return type
blockSize
- getCheckpointInterval()[source]
- Returns
set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
- Return type
checkpointInterval
- getColdStartStrategy()[source]
- Returns
strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
- Return type
coldStartStrategy
- getFinalStorageLevel()[source]
- Returns
StorageLevel for ALS model factors.
- Return type
finalStorageLevel
- getIntermediateStorageLevel()[source]
- Returns
StorageLevel for intermediate datasets. Cannot be ‘NONE’.
- Return type
intermediateStorageLevel
- getItemCol()[source]
- Returns
column name for item ids. Ids must be within the integer value range.
- Return type
itemCol
- getNonnegative()[source]
- Returns
whether to use nonnegative constraint for least squares
- Return type
nonnegative
- getSimilarityFunction()[source]
- Returns
Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
- Return type
similarityFunction
- getStartTime()[source]
- Returns
Set time custom now time if using historical data
- Return type
startTime
- getSupportThreshold()[source]
- Returns
Minimum number of ratings per item
- Return type
supportThreshold
- getTimeDecayCoeff()[source]
- Returns
Use to scale time decay coeff to different half life dur
- Return type
timeDecayCoeff
- getUserCol()[source]
- Returns
column name for user ids. Ids must be within the integer value range.
- Return type
userCol
- implicitPrefs = Param(parent='undefined', name='implicitPrefs', doc='whether to use implicit preference')
- intermediateStorageLevel = Param(parent='undefined', name='intermediateStorageLevel', doc="StorageLevel for intermediate datasets. Cannot be 'NONE'.")
- itemCol = Param(parent='undefined', name='itemCol', doc='column name for item ids. Ids must be within the integer value range.')
- maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')
- nonnegative = Param(parent='undefined', name='nonnegative', doc='whether to use nonnegative constraint for least squares')
- numItemBlocks = Param(parent='undefined', name='numItemBlocks', doc='number of item blocks')
- numUserBlocks = Param(parent='undefined', name='numUserBlocks', doc='number of user blocks')
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
- rank = Param(parent='undefined', name='rank', doc='rank of the factorization')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='column name for ratings')
- regParam = Param(parent='undefined', name='regParam', doc='regularization parameter (>= 0)')
- seed = Param(parent='undefined', name='seed', doc='random seed')
- setActivityTimeFormat(value)[source]
- Parameters
activityTimeFormat¶ – Time format for events, default: yyyy/MM/dd’T’h:mm:ss
- setBlockSize(value)[source]
- Parameters
blockSize¶ – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
- setCheckpointInterval(value)[source]
- Parameters
checkpointInterval¶ – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
- setColdStartStrategy(value)[source]
- Parameters
coldStartStrategy¶ – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
- setFinalStorageLevel(value)[source]
- Parameters
finalStorageLevel¶ – StorageLevel for ALS model factors.
- setIntermediateStorageLevel(value)[source]
- Parameters
intermediateStorageLevel¶ – StorageLevel for intermediate datasets. Cannot be ‘NONE’.
- setItemCol(value)[source]
- Parameters
itemCol¶ – column name for item ids. Ids must be within the integer value range.
- setNonnegative(value)[source]
- Parameters
nonnegative¶ – whether to use nonnegative constraint for least squares
- setParams(activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]
Set the (keyword only) parameters
- setSimilarityFunction(value)[source]
- Parameters
similarityFunction¶ – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
- setStartTime(value)[source]
- Parameters
startTime¶ – Set time custom now time if using historical data
- setSupportThreshold(value)[source]
- Parameters
supportThreshold¶ – Minimum number of ratings per item
- setTimeDecayCoeff(value)[source]
- Parameters
timeDecayCoeff¶ – Use to scale time decay coeff to different half life dur
- setUserCol(value)[source]
- Parameters
userCol¶ – column name for user ids. Ids must be within the integer value range.
- similarityFunction = Param(parent='undefined', name='similarityFunction', doc='Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.')
- startTime = Param(parent='undefined', name='startTime', doc='Set time custom now time if using historical data')
- startTimeFormat = Param(parent='undefined', name='startTimeFormat', doc='Format for start time')
- supportThreshold = Param(parent='undefined', name='supportThreshold', doc='Minimum number of ratings per item')
- timeCol = Param(parent='undefined', name='timeCol', doc='Time of activity')
- timeDecayCoeff = Param(parent='undefined', name='timeDecayCoeff', doc='Use to scale time decay coeff to different half life dur')
- userCol = Param(parent='undefined', name='userCol', doc='column name for user ids. Ids must be within the integer value range.')
synapse.ml.recommendation.SARModel module
- class synapse.ml.recommendation.SARModel.SARModel(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', itemDataFrame=None, maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=- 1453370660, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user', userDataFrame=None)[source]
Bases:
pyspark.ml.util.MLReadable
[pyspark.ml.util.RL
]
Module contents
SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.