synapse.ml.stages package
Submodules
synapse.ml.stages.Cacher module
- class synapse.ml.stages.Cacher.Cacher(java_obj=None, disable=False)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
disable¶ (bool) – Whether or disable caching (so that you can turn it off during evaluation)
- disable = Param(parent='undefined', name='disable', doc='Whether or disable caching (so that you can turn it off during evaluation)')
- getDisable()[source]
- Returns:
Whether or disable caching (so that you can turn it off during evaluation)
- Return type:
disable
synapse.ml.stages.ClassBalancer module
- class synapse.ml.stages.ClassBalancer.ClassBalancer(java_obj=None, broadcastJoin=True, inputCol=None, outputCol='weight')[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaEstimator
- Parameters:
- broadcastJoin = Param(parent='undefined', name='broadcastJoin', doc='Whether to broadcast the class to weight mapping to the worker')
- getBroadcastJoin()[source]
- Returns:
Whether to broadcast the class to weight mapping to the worker
- Return type:
broadcastJoin
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
synapse.ml.stages.ClassBalancerModel module
- class synapse.ml.stages.ClassBalancerModel.ClassBalancerModel(java_obj=None, broadcastJoin=None, inputCol=None, outputCol=None, weights=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaModel
- Parameters:
- broadcastJoin = Param(parent='undefined', name='broadcastJoin', doc='whether to broadcast join')
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
- setParams(broadcastJoin=None, inputCol=None, outputCol=None, weights=None)[source]
Set the (keyword only) parameters
- weights = Param(parent='undefined', name='weights', doc='the dataframe of weights')
synapse.ml.stages.DropColumns module
- class synapse.ml.stages.DropColumns.DropColumns(java_obj=None, cols=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- cols = Param(parent='undefined', name='cols', doc='Comma separated list of column names')
synapse.ml.stages.DynamicMiniBatchTransformer module
- class synapse.ml.stages.DynamicMiniBatchTransformer.DynamicMiniBatchTransformer(java_obj=None, maxBatchSize=2147483647)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- maxBatchSize = Param(parent='undefined', name='maxBatchSize', doc='The max size of the buffer')
synapse.ml.stages.EnsembleByKey module
- class synapse.ml.stages.EnsembleByKey.EnsembleByKey(java_obj=None, colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- colNames = Param(parent='undefined', name='colNames', doc='Names of the result of each col')
- collapseGroup = Param(parent='undefined', name='collapseGroup', doc='Whether to collapse all items in group to one entry')
- cols = Param(parent='undefined', name='cols', doc='Cols to ensemble')
- getCollapseGroup()[source]
- Returns:
Whether to collapse all items in group to one entry
- Return type:
collapseGroup
- getVectorDims()[source]
- Returns:
the dimensions of any vector columns, used to avoid materialization
- Return type:
vectorDims
- keys = Param(parent='undefined', name='keys', doc='Keys to group by')
- setCollapseGroup(value)[source]
- Parameters:
collapseGroup¶ – Whether to collapse all items in group to one entry
- setParams(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]
Set the (keyword only) parameters
- setVectorDims(value)[source]
- Parameters:
vectorDims¶ – the dimensions of any vector columns, used to avoid materialization
- strategy = Param(parent='undefined', name='strategy', doc='How to ensemble the scores, ex: mean')
- vectorDims = Param(parent='undefined', name='vectorDims', doc='the dimensions of any vector columns, used to avoid materialization')
synapse.ml.stages.Explode module
- class synapse.ml.stages.Explode.Explode(java_obj=None, inputCol=None, outputCol='Explode_000e59cc3eda_output')[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
synapse.ml.stages.FixedMiniBatchTransformer module
- class synapse.ml.stages.FixedMiniBatchTransformer.FixedMiniBatchTransformer(java_obj=None, batchSize=None, buffered=False, maxBufferSize=2147483647)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
- buffered = Param(parent='undefined', name='buffered', doc='Whether or not to buffer batches in memory')
- maxBufferSize = Param(parent='undefined', name='maxBufferSize', doc='The max size of the buffer')
synapse.ml.stages.FlattenBatch module
- class synapse.ml.stages.FlattenBatch.FlattenBatch(java_obj=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
Args:
synapse.ml.stages.Lambda module
- class synapse.ml.stages.Lambda.Lambda(java_obj=None, transformFunc=None, transformSchemaFunc=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- getTransformSchemaFunc()[source]
- Returns:
the output schema after the transformation
- Return type:
transformSchemaFunc
- setTransformSchemaFunc(value)[source]
- Parameters:
transformSchemaFunc¶ – the output schema after the transformation
- transformFunc = Param(parent='undefined', name='transformFunc', doc='holder for dataframe function')
- transformSchemaFunc = Param(parent='undefined', name='transformSchemaFunc', doc='the output schema after the transformation')
synapse.ml.stages.MultiColumnAdapter module
- class synapse.ml.stages.MultiColumnAdapter.MultiColumnAdapter(java_obj=None, baseStage=None, inputCols=None, outputCols=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaEstimator
- Parameters:
- baseStage = Param(parent='undefined', name='baseStage', doc='base pipeline stage to apply to every column')
- getBaseStage()[source]
- Returns:
base pipeline stage to apply to every column
- Return type:
baseStage
- inputCols = Param(parent='undefined', name='inputCols', doc='list of column names encoded as a string')
- outputCols = Param(parent='undefined', name='outputCols', doc='list of column names encoded as a string')
synapse.ml.stages.PartitionConsolidator module
- class synapse.ml.stages.PartitionConsolidator.PartitionConsolidator(java_obj=None, concurrency=1, concurrentTimeout=None, inputCol=None, outputCol=None, timeout=60.0)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
- concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
- getConcurrentTimeout()[source]
- Returns:
max number seconds to wait on futures if concurrency >= 1
- Return type:
concurrentTimeout
- getTimeout()[source]
- Returns:
number of seconds to wait before closing the connection
- Return type:
timeout
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
- setConcurrentTimeout(value)[source]
- Parameters:
concurrentTimeout¶ – max number seconds to wait on futures if concurrency >= 1
- setParams(concurrency=1, concurrentTimeout=None, inputCol=None, outputCol=None, timeout=60.0)[source]
Set the (keyword only) parameters
- setTimeout(value)[source]
- Parameters:
timeout¶ – number of seconds to wait before closing the connection
- timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
synapse.ml.stages.RenameColumn module
- class synapse.ml.stages.RenameColumn.RenameColumn(java_obj=None, inputCol=None, outputCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
synapse.ml.stages.Repartition module
- class synapse.ml.stages.Repartition.Repartition(java_obj=None, disable=False, n=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- disable = Param(parent='undefined', name='disable', doc='Whether to disable repartitioning (so that one can turn it off for evaluation)')
- getDisable()[source]
- Returns:
Whether to disable repartitioning (so that one can turn it off for evaluation)
- Return type:
disable
- n = Param(parent='undefined', name='n', doc='Number of partitions')
synapse.ml.stages.SelectColumns module
- class synapse.ml.stages.SelectColumns.SelectColumns(java_obj=None, cols=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- cols = Param(parent='undefined', name='cols', doc='Comma separated list of selected column names')
synapse.ml.stages.StratifiedRepartition module
- class synapse.ml.stages.StratifiedRepartition.StratifiedRepartition(java_obj=None, labelCol=None, mode='mixed', seed=1518410069)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- getMode()[source]
- Returns:
Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic
- Return type:
mode
- labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
- mode = Param(parent='undefined', name='mode', doc='Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic')
- seed = Param(parent='undefined', name='seed', doc='random seed')
synapse.ml.stages.SummarizeData module
- class synapse.ml.stages.SummarizeData.SummarizeData(java_obj=None, basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- basic = Param(parent='undefined', name='basic', doc='Compute basic statistics')
- counts = Param(parent='undefined', name='counts', doc='Compute count statistics')
- errorThreshold = Param(parent='undefined', name='errorThreshold', doc='Threshold for quantiles - 0 is exact')
- getErrorThreshold()[source]
- Returns:
Threshold for quantiles - 0 is exact
- Return type:
errorThreshold
- percentiles = Param(parent='undefined', name='percentiles', doc='Compute percentiles')
- sample = Param(parent='undefined', name='sample', doc='Compute sample statistics')
- setErrorThreshold(value)[source]
- Parameters:
errorThreshold¶ – Threshold for quantiles - 0 is exact
synapse.ml.stages.TextPreprocessor module
- class synapse.ml.stages.TextPreprocessor.TextPreprocessor(java_obj=None, inputCol=None, map=None, normFunc=None, outputCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
- map = Param(parent='undefined', name='map', doc='Map of substring match to replacement')
- normFunc = Param(parent='undefined', name='normFunc', doc='Name of normalization function to apply')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
synapse.ml.stages.TimeIntervalMiniBatchTransformer module
- class synapse.ml.stages.TimeIntervalMiniBatchTransformer.TimeIntervalMiniBatchTransformer(java_obj=None, maxBatchSize=2147483647, millisToWait=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- getMillisToWait()[source]
- Returns:
The time to wait before constructing a batch
- Return type:
millisToWait
- maxBatchSize = Param(parent='undefined', name='maxBatchSize', doc='The max size of the buffer')
- millisToWait = Param(parent='undefined', name='millisToWait', doc='The time to wait before constructing a batch')
synapse.ml.stages.Timer module
- class synapse.ml.stages.Timer.Timer(java_obj=None, disableMaterialization=True, logToScala=True, stage=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaEstimator
- Parameters:
- disableMaterialization = Param(parent='undefined', name='disableMaterialization', doc='Whether to disable timing (so that one can turn it off for evaluation)')
- getDisableMaterialization()[source]
- Returns:
Whether to disable timing (so that one can turn it off for evaluation)
- Return type:
disableMaterialization
- getLogToScala()[source]
- Returns:
Whether to output the time to the scala console
- Return type:
logToScala
- logToScala = Param(parent='undefined', name='logToScala', doc='Whether to output the time to the scala console')
- setDisableMaterialization(value)[source]
- Parameters:
disableMaterialization¶ – Whether to disable timing (so that one can turn it off for evaluation)
- setLogToScala(value)[source]
- Parameters:
logToScala¶ – Whether to output the time to the scala console
- setParams(disableMaterialization=True, logToScala=True, stage=None)[source]
Set the (keyword only) parameters
- stage = Param(parent='undefined', name='stage', doc='The stage to time')
synapse.ml.stages.TimerModel module
- class synapse.ml.stages.TimerModel.TimerModel(java_obj=None, disableMaterialization=True, logToScala=True, stage=None, transformer=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaModel
- Parameters:
- disableMaterialization = Param(parent='undefined', name='disableMaterialization', doc='Whether to disable timing (so that one can turn it off for evaluation)')
- getDisableMaterialization()[source]
- Returns:
Whether to disable timing (so that one can turn it off for evaluation)
- Return type:
disableMaterialization
- getLogToScala()[source]
- Returns:
Whether to output the time to the scala console
- Return type:
logToScala
- logToScala = Param(parent='undefined', name='logToScala', doc='Whether to output the time to the scala console')
- setDisableMaterialization(value)[source]
- Parameters:
disableMaterialization¶ – Whether to disable timing (so that one can turn it off for evaluation)
- setLogToScala(value)[source]
- Parameters:
logToScala¶ – Whether to output the time to the scala console
- setParams(disableMaterialization=True, logToScala=True, stage=None, transformer=None)[source]
Set the (keyword only) parameters
- stage = Param(parent='undefined', name='stage', doc='The stage to time')
- transformer = Param(parent='undefined', name='transformer', doc='inner model to time')
synapse.ml.stages.UDFTransformer module
- class synapse.ml.stages.UDFTransformer.UDFTransformer(inputCol=None, inputCols=None, outputCol=None, udf=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
synapse.ml.stages.UnicodeNormalize module
- class synapse.ml.stages.UnicodeNormalize.UnicodeNormalize(java_obj=None, form=None, inputCol=None, lower=None, outputCol=None)[source]
Bases:
ComplexParamsMixin
,JavaMLReadable
,JavaMLWritable
,JavaTransformer
- Parameters:
- form = Param(parent='undefined', name='form', doc='Unicode normalization form: NFC, NFD, NFKC, NFKD')
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
- lower = Param(parent='undefined', name='lower', doc='Lowercase text')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
Module contents
SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.