synapse.ml.recommendation package
Submodules
synapse.ml.recommendation.RankingAdapter module
- class synapse.ml.recommendation.RankingAdapter.RankingAdapter(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaEstimator
- Parameters:
- itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
- k = Param(parent='undefined', name='k', doc='number of items')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
- minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
- mode = Param(parent='undefined', name='mode', doc='recommendation mode')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
- recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
- setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, userCol=None)[source]
Set the (keyword only) parameters
- userCol = Param(parent='undefined', name='userCol', doc='Column of users')
synapse.ml.recommendation.RankingAdapterModel module
- class synapse.ml.recommendation.RankingAdapterModel.RankingAdapterModel(java_obj=None, itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaModel
- Parameters:
- itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
- k = Param(parent='undefined', name='k', doc='number of items')
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- minRatingsPerItem = Param(parent='undefined', name='minRatingsPerItem', doc='min ratings for items > 0')
- minRatingsPerUser = Param(parent='undefined', name='minRatingsPerUser', doc='min ratings for users > 0')
- mode = Param(parent='undefined', name='mode', doc='recommendation mode')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
- recommender = Param(parent='undefined', name='recommender', doc='estimator for selection')
- recommenderModel = Param(parent='undefined', name='recommenderModel', doc='recommenderModel')
- setParams(itemCol=None, k=10, labelCol='label', minRatingsPerItem=1, minRatingsPerUser=1, mode='allUsers', ratingCol=None, recommender=None, recommenderModel=None, userCol=None)[source]
Set the (keyword only) parameters
- userCol = Param(parent='undefined', name='userCol', doc='Column of users')
synapse.ml.recommendation.RankingEvaluator module
- class synapse.ml.recommendation.RankingEvaluator.RankingEvaluator(java_obj=None, itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaEvaluator
- Parameters:
- getMetricName()[source]
- Returns:
metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)
- Return type:
metricName
- itemCol = Param(parent='undefined', name='itemCol', doc='Column of items')
- k = Param(parent='undefined', name='k', doc='number of items')
- labelCol = Param(parent='undefined', name='labelCol', doc='label column name')
- metricName = Param(parent='undefined', name='metricName', doc='metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)')
- nItems = Param(parent='undefined', name='nItems', doc='number of items')
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Column of ratings')
- setMetricName(value)[source]
- Parameters:
metricName¶ – metric name in evaluation (ndcgAt|map|precisionAtk|recallAtK|diversityAtK|maxDiversity|mrr|fcp)
- setParams(itemCol=None, k=10, labelCol='label', metricName='ndcgAt', nItems=- 1, predictionCol='prediction', ratingCol=None, userCol=None)[source]
Set the (keyword only) parameters
- userCol = Param(parent='undefined', name='userCol', doc='Column of users')
synapse.ml.recommendation.RankingTrainValidationSplit module
synapse.ml.recommendation.RankingTrainValidationSplitModel module
synapse.ml.recommendation.RecommendationIndexer module
- class synapse.ml.recommendation.RecommendationIndexer.RecommendationIndexer(java_obj=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaEstimator
- Parameters:
- itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
- itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
- setParams(itemInputCol=None, itemOutputCol=None, ratingCol=None, userInputCol=None, userOutputCol=None)[source]
Set the (keyword only) parameters
- userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
- userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')
synapse.ml.recommendation.RecommendationIndexerModel module
- class synapse.ml.recommendation.RecommendationIndexerModel.RecommendationIndexerModel(java_obj=None, itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaModel
- Parameters:
- itemIndexModel = Param(parent='undefined', name='itemIndexModel', doc='itemIndexModel')
- itemInputCol = Param(parent='undefined', name='itemInputCol', doc='Item Input Col')
- itemOutputCol = Param(parent='undefined', name='itemOutputCol', doc='Item Output Col')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='Rating Col')
- setParams(itemIndexModel=None, itemInputCol=None, itemOutputCol=None, ratingCol=None, userIndexModel=None, userInputCol=None, userOutputCol=None)[source]
Set the (keyword only) parameters
- userIndexModel = Param(parent='undefined', name='userIndexModel', doc='userIndexModel')
- userInputCol = Param(parent='undefined', name='userInputCol', doc='User Input Col')
- userOutputCol = Param(parent='undefined', name='userOutputCol', doc='User Output Col')
synapse.ml.recommendation.SAR module
- class synapse.ml.recommendation.SAR.SAR(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaEstimator
- Parameters:
activityTimeFormat¶ (str) – Time format for events, default: yyyy/MM/dd’T’h:mm:ss
blockSize¶ (int) – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
checkpointInterval¶ (int) – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
coldStartStrategy¶ (str) – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
finalStorageLevel¶ (str) – StorageLevel for ALS model factors.
intermediateStorageLevel¶ (str) – StorageLevel for intermediate datasets. Cannot be ‘NONE’.
itemCol¶ (str) – column name for item ids. Ids must be within the integer value range.
nonnegative¶ (bool) – whether to use nonnegative constraint for least squares
seed¶ (long) – random seed
similarityFunction¶ (str) – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
startTime¶ (str) – Set time custom now time if using historical data
supportThreshold¶ (int) – Minimum number of ratings per item
timeDecayCoeff¶ (int) – Use to scale time decay coeff to different half life dur
userCol¶ (str) – column name for user ids. Ids must be within the integer value range.
- activityTimeFormat = Param(parent='undefined', name='activityTimeFormat', doc="Time format for events, default: yyyy/MM/dd'T'h:mm:ss")
- alpha = Param(parent='undefined', name='alpha', doc='alpha for implicit preference')
- blockSize = Param(parent='undefined', name='blockSize', doc='block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.')
- checkpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext')
- coldStartStrategy = Param(parent='undefined', name='coldStartStrategy', doc='strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.')
- finalStorageLevel = Param(parent='undefined', name='finalStorageLevel', doc='StorageLevel for ALS model factors.')
- getActivityTimeFormat()[source]
- Returns:
Time format for events, default: yyyy/MM/dd’T’h:mm:ss
- Return type:
activityTimeFormat
- getBlockSize()[source]
- Returns:
block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
- Return type:
blockSize
- getCheckpointInterval()[source]
- Returns:
set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
- Return type:
checkpointInterval
- getColdStartStrategy()[source]
- Returns:
strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
- Return type:
coldStartStrategy
- getFinalStorageLevel()[source]
- Returns:
StorageLevel for ALS model factors.
- Return type:
finalStorageLevel
- getIntermediateStorageLevel()[source]
- Returns:
StorageLevel for intermediate datasets. Cannot be ‘NONE’.
- Return type:
intermediateStorageLevel
- getItemCol()[source]
- Returns:
column name for item ids. Ids must be within the integer value range.
- Return type:
itemCol
- getNonnegative()[source]
- Returns:
whether to use nonnegative constraint for least squares
- Return type:
nonnegative
- getSimilarityFunction()[source]
- Returns:
Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
- Return type:
similarityFunction
- getStartTime()[source]
- Returns:
Set time custom now time if using historical data
- Return type:
startTime
- getSupportThreshold()[source]
- Returns:
Minimum number of ratings per item
- Return type:
supportThreshold
- getTimeDecayCoeff()[source]
- Returns:
Use to scale time decay coeff to different half life dur
- Return type:
timeDecayCoeff
- getUserCol()[source]
- Returns:
column name for user ids. Ids must be within the integer value range.
- Return type:
userCol
- implicitPrefs = Param(parent='undefined', name='implicitPrefs', doc='whether to use implicit preference')
- intermediateStorageLevel = Param(parent='undefined', name='intermediateStorageLevel', doc="StorageLevel for intermediate datasets. Cannot be 'NONE'.")
- itemCol = Param(parent='undefined', name='itemCol', doc='column name for item ids. Ids must be within the integer value range.')
- maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')
- nonnegative = Param(parent='undefined', name='nonnegative', doc='whether to use nonnegative constraint for least squares')
- numItemBlocks = Param(parent='undefined', name='numItemBlocks', doc='number of item blocks')
- numUserBlocks = Param(parent='undefined', name='numUserBlocks', doc='number of user blocks')
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')
- rank = Param(parent='undefined', name='rank', doc='rank of the factorization')
- ratingCol = Param(parent='undefined', name='ratingCol', doc='column name for ratings')
- regParam = Param(parent='undefined', name='regParam', doc='regularization parameter (>= 0)')
- seed = Param(parent='undefined', name='seed', doc='random seed')
- setActivityTimeFormat(value)[source]
- Parameters:
activityTimeFormat¶ – Time format for events, default: yyyy/MM/dd’T’h:mm:ss
- setBlockSize(value)[source]
- Parameters:
blockSize¶ – block size for stacking input data in matrices. Data is stacked within partitions. If block size is more than remaining data in a partition then it is adjusted to the size of this data.
- setCheckpointInterval(value)[source]
- Parameters:
checkpointInterval¶ – set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext
- setColdStartStrategy(value)[source]
- Parameters:
coldStartStrategy¶ – strategy for dealing with unknown or new users/items at prediction time. This may be useful in cross-validation or production scenarios, for handling user/item ids the model has not seen in the training data. Supported values: nan,drop.
- setFinalStorageLevel(value)[source]
- Parameters:
finalStorageLevel¶ – StorageLevel for ALS model factors.
- setIntermediateStorageLevel(value)[source]
- Parameters:
intermediateStorageLevel¶ – StorageLevel for intermediate datasets. Cannot be ‘NONE’.
- setItemCol(value)[source]
- Parameters:
itemCol¶ – column name for item ids. Ids must be within the integer value range.
- setNonnegative(value)[source]
- Parameters:
nonnegative¶ – whether to use nonnegative constraint for least squares
- setParams(activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=356704333, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user')[source]
Set the (keyword only) parameters
- setSimilarityFunction(value)[source]
- Parameters:
similarityFunction¶ – Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.
- setStartTime(value)[source]
- Parameters:
startTime¶ – Set time custom now time if using historical data
- setSupportThreshold(value)[source]
- Parameters:
supportThreshold¶ – Minimum number of ratings per item
- setTimeDecayCoeff(value)[source]
- Parameters:
timeDecayCoeff¶ – Use to scale time decay coeff to different half life dur
- setUserCol(value)[source]
- Parameters:
userCol¶ – column name for user ids. Ids must be within the integer value range.
- similarityFunction = Param(parent='undefined', name='similarityFunction', doc='Defines the similarity function to be used by the model. Lift favors serendipity, Co-occurrence favors predictability, and Jaccard is a nice compromise between the two.')
- startTime = Param(parent='undefined', name='startTime', doc='Set time custom now time if using historical data')
- startTimeFormat = Param(parent='undefined', name='startTimeFormat', doc='Format for start time')
- supportThreshold = Param(parent='undefined', name='supportThreshold', doc='Minimum number of ratings per item')
- timeCol = Param(parent='undefined', name='timeCol', doc='Time of activity')
- timeDecayCoeff = Param(parent='undefined', name='timeDecayCoeff', doc='Use to scale time decay coeff to different half life dur')
- userCol = Param(parent='undefined', name='userCol', doc='column name for user ids. Ids must be within the integer value range.')
synapse.ml.recommendation.SARModel module
- class synapse.ml.recommendation.SARModel.SARModel(java_obj=None, activityTimeFormat="yyyy/MM/dd'T'h:mm:ss", alpha=1.0, blockSize=4096, checkpointInterval=10, coldStartStrategy='nan', finalStorageLevel='MEMORY_AND_DISK', implicitPrefs=False, intermediateStorageLevel='MEMORY_AND_DISK', itemCol='item', itemDataFrame=None, maxIter=10, nonnegative=False, numItemBlocks=10, numUserBlocks=10, predictionCol='prediction', rank=10, ratingCol='rating', regParam=0.1, seed=- 1453370660, similarityFunction='jaccard', startTime=None, startTimeFormat='EEE MMM dd HH:mm:ss Z yyyy', supportThreshold=4, timeCol='time', timeDecayCoeff=30, userCol='user', userDataFrame=None)[source]
Bases:
_SARModel
Module contents
SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.