synapse.ml.stages package

Submodules

synapse.ml.stages.Cacher module

class synapse.ml.stages.Cacher.Cacher(java_obj=None, disable=False)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:

disable (bool) – Whether or disable caching (so that you can turn it off during evaluation)

disable = Param(parent='undefined', name='disable', doc='Whether or disable caching (so that you can turn it off during evaluation)')
getDisable()[source]
Returns:

Whether or disable caching (so that you can turn it off during evaluation)

Return type:

disable

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setDisable(value)[source]
Parameters:

disable – Whether or disable caching (so that you can turn it off during evaluation)

setParams(disable=False)[source]

Set the (keyword only) parameters

synapse.ml.stages.ClassBalancer module

class synapse.ml.stages.ClassBalancer.ClassBalancer(java_obj=None, broadcastJoin=True, inputCol=None, outputCol='weight')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • broadcastJoin (bool) – Whether to broadcast the class to weight mapping to the worker

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

broadcastJoin = Param(parent='undefined', name='broadcastJoin', doc='Whether to broadcast the class to weight mapping to the worker')
getBroadcastJoin()[source]
Returns:

Whether to broadcast the class to weight mapping to the worker

Return type:

broadcastJoin

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBroadcastJoin(value)[source]
Parameters:

broadcastJoin – Whether to broadcast the class to weight mapping to the worker

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(broadcastJoin=True, inputCol=None, outputCol='weight')[source]

Set the (keyword only) parameters

synapse.ml.stages.ClassBalancerModel module

class synapse.ml.stages.ClassBalancerModel.ClassBalancerModel(java_obj=None, broadcastJoin=None, inputCol=None, outputCol=None, weights=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • broadcastJoin (bool) – whether to broadcast join

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • weights (object) – the dataframe of weights

broadcastJoin = Param(parent='undefined', name='broadcastJoin', doc='whether to broadcast join')
getBroadcastJoin()[source]
Returns:

whether to broadcast join

Return type:

broadcastJoin

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getWeights()[source]
Returns:

the dataframe of weights

Return type:

weights

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBroadcastJoin(value)[source]
Parameters:

broadcastJoin – whether to broadcast join

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(broadcastJoin=None, inputCol=None, outputCol=None, weights=None)[source]

Set the (keyword only) parameters

setWeights(value)[source]
Parameters:

weights – the dataframe of weights

weights = Param(parent='undefined', name='weights', doc='the dataframe of weights')

synapse.ml.stages.DropColumns module

class synapse.ml.stages.DropColumns.DropColumns(java_obj=None, cols=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:

cols (list) – Comma separated list of column names

cols = Param(parent='undefined', name='cols', doc='Comma separated list of column names')
getCols()[source]
Returns:

Comma separated list of column names

Return type:

cols

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setCols(value)[source]
Parameters:

cols – Comma separated list of column names

setParams(cols=None)[source]

Set the (keyword only) parameters

synapse.ml.stages.DynamicMiniBatchTransformer module

class synapse.ml.stages.DynamicMiniBatchTransformer.DynamicMiniBatchTransformer(java_obj=None, maxBatchSize=2147483647)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:

maxBatchSize (int) – The max size of the buffer

static getJavaPackage()[source]

Returns package name String.

getMaxBatchSize()[source]
Returns:

The max size of the buffer

Return type:

maxBatchSize

maxBatchSize = Param(parent='undefined', name='maxBatchSize', doc='The max size of the buffer')
classmethod read()[source]

Returns an MLReader instance for this class.

setMaxBatchSize(value)[source]
Parameters:

maxBatchSize – The max size of the buffer

setParams(maxBatchSize=2147483647)[source]

Set the (keyword only) parameters

synapse.ml.stages.EnsembleByKey module

class synapse.ml.stages.EnsembleByKey.EnsembleByKey(java_obj=None, colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • colNames (list) – Names of the result of each col

  • collapseGroup (bool) – Whether to collapse all items in group to one entry

  • cols (list) – Cols to ensemble

  • keys (list) – Keys to group by

  • strategy (str) – How to ensemble the scores, ex: mean

  • vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization

colNames = Param(parent='undefined', name='colNames', doc='Names of the result of each col')
collapseGroup = Param(parent='undefined', name='collapseGroup', doc='Whether to collapse all items in group to one entry')
cols = Param(parent='undefined', name='cols', doc='Cols to ensemble')
getColNames()[source]
Returns:

Names of the result of each col

Return type:

colNames

getCollapseGroup()[source]
Returns:

Whether to collapse all items in group to one entry

Return type:

collapseGroup

getCols()[source]
Returns:

Cols to ensemble

Return type:

cols

static getJavaPackage()[source]

Returns package name String.

getKeys()[source]
Returns:

Keys to group by

Return type:

keys

getStrategy()[source]
Returns:

How to ensemble the scores, ex: mean

Return type:

strategy

getVectorDims()[source]
Returns:

the dimensions of any vector columns, used to avoid materialization

Return type:

vectorDims

keys = Param(parent='undefined', name='keys', doc='Keys to group by')
classmethod read()[source]

Returns an MLReader instance for this class.

setColNames(value)[source]
Parameters:

colNames – Names of the result of each col

setCollapseGroup(value)[source]
Parameters:

collapseGroup – Whether to collapse all items in group to one entry

setCols(value)[source]
Parameters:

cols – Cols to ensemble

setKeys(value)[source]
Parameters:

keys – Keys to group by

setParams(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]

Set the (keyword only) parameters

setStrategy(value)[source]
Parameters:

strategy – How to ensemble the scores, ex: mean

setVectorDims(value)[source]
Parameters:

vectorDims – the dimensions of any vector columns, used to avoid materialization

strategy = Param(parent='undefined', name='strategy', doc='How to ensemble the scores, ex: mean')
vectorDims = Param(parent='undefined', name='vectorDims', doc='the dimensions of any vector columns, used to avoid materialization')

synapse.ml.stages.Explode module

class synapse.ml.stages.Explode.Explode(java_obj=None, inputCol=None, outputCol='Explode_0905c7463fd8_output')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, outputCol='Explode_0905c7463fd8_output')[source]

Set the (keyword only) parameters

synapse.ml.stages.FixedMiniBatchTransformer module

class synapse.ml.stages.FixedMiniBatchTransformer.FixedMiniBatchTransformer(java_obj=None, batchSize=None, buffered=False, maxBufferSize=2147483647)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • batchSize (int) – The max size of the buffer

  • buffered (bool) – Whether or not to buffer batches in memory

  • maxBufferSize (int) – The max size of the buffer

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
buffered = Param(parent='undefined', name='buffered', doc='Whether or not to buffer batches in memory')
getBatchSize()[source]
Returns:

The max size of the buffer

Return type:

batchSize

getBuffered()[source]
Returns:

Whether or not to buffer batches in memory

Return type:

buffered

static getJavaPackage()[source]

Returns package name String.

getMaxBufferSize()[source]
Returns:

The max size of the buffer

Return type:

maxBufferSize

maxBufferSize = Param(parent='undefined', name='maxBufferSize', doc='The max size of the buffer')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters:

batchSize – The max size of the buffer

setBuffered(value)[source]
Parameters:

buffered – Whether or not to buffer batches in memory

setMaxBufferSize(value)[source]
Parameters:

maxBufferSize – The max size of the buffer

setParams(batchSize=None, buffered=False, maxBufferSize=2147483647)[source]

Set the (keyword only) parameters

synapse.ml.stages.FlattenBatch module

class synapse.ml.stages.FlattenBatch.FlattenBatch(java_obj=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Args:

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setParams()[source]

Set the (keyword only) parameters

synapse.ml.stages.Lambda module

class synapse.ml.stages.Lambda.Lambda(java_obj=None, transformFunc=None, transformSchemaFunc=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • transformFunc (object) – holder for dataframe function

  • transformSchemaFunc (object) – the output schema after the transformation

static getJavaPackage()[source]

Returns package name String.

getTransformFunc()[source]
Returns:

holder for dataframe function

Return type:

transformFunc

getTransformSchemaFunc()[source]
Returns:

the output schema after the transformation

Return type:

transformSchemaFunc

classmethod read()[source]

Returns an MLReader instance for this class.

setParams(transformFunc=None, transformSchemaFunc=None)[source]

Set the (keyword only) parameters

setTransformFunc(value)[source]
Parameters:

transformFunc – holder for dataframe function

setTransformSchemaFunc(value)[source]
Parameters:

transformSchemaFunc – the output schema after the transformation

transformFunc = Param(parent='undefined', name='transformFunc', doc='holder for dataframe function')
transformSchemaFunc = Param(parent='undefined', name='transformSchemaFunc', doc='the output schema after the transformation')

synapse.ml.stages.MultiColumnAdapter module

class synapse.ml.stages.MultiColumnAdapter.MultiColumnAdapter(java_obj=None, baseStage=None, inputCols=None, outputCols=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • baseStage (object) – base pipeline stage to apply to every column

  • inputCols (list) – list of column names encoded as a string

  • outputCols (list) – list of column names encoded as a string

baseStage = Param(parent='undefined', name='baseStage', doc='base pipeline stage to apply to every column')
getBaseStage()[source]
Returns:

base pipeline stage to apply to every column

Return type:

baseStage

getInputCols()[source]
Returns:

list of column names encoded as a string

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getOutputCols()[source]
Returns:

list of column names encoded as a string

Return type:

outputCols

inputCols = Param(parent='undefined', name='inputCols', doc='list of column names encoded as a string')
outputCols = Param(parent='undefined', name='outputCols', doc='list of column names encoded as a string')
classmethod read()[source]

Returns an MLReader instance for this class.

setBaseStage(value)[source]
Parameters:

baseStage – base pipeline stage to apply to every column

setInputCols(value)[source]
Parameters:

inputCols – list of column names encoded as a string

setOutputCols(value)[source]
Parameters:

outputCols – list of column names encoded as a string

setParams(baseStage=None, inputCols=None, outputCols=None)[source]

Set the (keyword only) parameters

synapse.ml.stages.PartitionConsolidator module

class synapse.ml.stages.PartitionConsolidator.PartitionConsolidator(java_obj=None, concurrency=1, concurrentTimeout=None, inputCol=None, outputCol=None, timeout=60.0)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • timeout (float) – number of seconds to wait before closing the connection

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, inputCol=None, outputCol=None, timeout=60.0)[source]

Set the (keyword only) parameters

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')

synapse.ml.stages.RenameColumn module

class synapse.ml.stages.RenameColumn.RenameColumn(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.stages.Repartition module

class synapse.ml.stages.Repartition.Repartition(java_obj=None, disable=False, n=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • disable (bool) – Whether to disable repartitioning (so that one can turn it off for evaluation)

  • n (int) – Number of partitions

disable = Param(parent='undefined', name='disable', doc='Whether to disable repartitioning (so that one can turn it off for evaluation)')
getDisable()[source]
Returns:

Whether to disable repartitioning (so that one can turn it off for evaluation)

Return type:

disable

static getJavaPackage()[source]

Returns package name String.

getN()[source]
Returns:

Number of partitions

Return type:

n

n = Param(parent='undefined', name='n', doc='Number of partitions')
classmethod read()[source]

Returns an MLReader instance for this class.

setDisable(value)[source]
Parameters:

disable – Whether to disable repartitioning (so that one can turn it off for evaluation)

setN(value)[source]
Parameters:

n – Number of partitions

setParams(disable=False, n=None)[source]

Set the (keyword only) parameters

synapse.ml.stages.SelectColumns module

class synapse.ml.stages.SelectColumns.SelectColumns(java_obj=None, cols=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:

cols (list) – Comma separated list of selected column names

cols = Param(parent='undefined', name='cols', doc='Comma separated list of selected column names')
getCols()[source]
Returns:

Comma separated list of selected column names

Return type:

cols

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setCols(value)[source]
Parameters:

cols – Comma separated list of selected column names

setParams(cols=None)[source]

Set the (keyword only) parameters

synapse.ml.stages.StratifiedRepartition module

class synapse.ml.stages.StratifiedRepartition.StratifiedRepartition(java_obj=None, labelCol=None, mode='mixed', seed=1518410069)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • labelCol (str) – The name of the label column

  • mode (str) – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic

  • seed (long) – random seed

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:

The name of the label column

Return type:

labelCol

getMode()[source]
Returns:

Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic

Return type:

mode

getSeed()[source]
Returns:

random seed

Return type:

seed

labelCol = Param(parent='undefined', name='labelCol', doc='The name of the label column')
mode = Param(parent='undefined', name='mode', doc='Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic')
classmethod read()[source]

Returns an MLReader instance for this class.

seed = Param(parent='undefined', name='seed', doc='random seed')
setLabelCol(value)[source]
Parameters:

labelCol – The name of the label column

setMode(value)[source]
Parameters:

mode – Specify equal to repartition with replacement across all labels, specify original to keep the ratios in the original dataset, or specify mixed to use a heuristic

setParams(labelCol=None, mode='mixed', seed=1518410069)[source]

Set the (keyword only) parameters

setSeed(value)[source]
Parameters:

seed – random seed

synapse.ml.stages.SummarizeData module

class synapse.ml.stages.SummarizeData.SummarizeData(java_obj=None, basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • basic (bool) – Compute basic statistics

  • counts (bool) – Compute count statistics

  • errorThreshold (float) – Threshold for quantiles - 0 is exact

  • percentiles (bool) – Compute percentiles

  • sample (bool) – Compute sample statistics

basic = Param(parent='undefined', name='basic', doc='Compute basic statistics')
counts = Param(parent='undefined', name='counts', doc='Compute count statistics')
errorThreshold = Param(parent='undefined', name='errorThreshold', doc='Threshold for quantiles - 0 is exact')
getBasic()[source]
Returns:

Compute basic statistics

Return type:

basic

getCounts()[source]
Returns:

Compute count statistics

Return type:

counts

getErrorThreshold()[source]
Returns:

Threshold for quantiles - 0 is exact

Return type:

errorThreshold

static getJavaPackage()[source]

Returns package name String.

getPercentiles()[source]
Returns:

Compute percentiles

Return type:

percentiles

getSample()[source]
Returns:

Compute sample statistics

Return type:

sample

percentiles = Param(parent='undefined', name='percentiles', doc='Compute percentiles')
classmethod read()[source]

Returns an MLReader instance for this class.

sample = Param(parent='undefined', name='sample', doc='Compute sample statistics')
setBasic(value)[source]
Parameters:

basic – Compute basic statistics

setCounts(value)[source]
Parameters:

counts – Compute count statistics

setErrorThreshold(value)[source]
Parameters:

errorThreshold – Threshold for quantiles - 0 is exact

setParams(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]

Set the (keyword only) parameters

setPercentiles(value)[source]
Parameters:

percentiles – Compute percentiles

setSample(value)[source]
Parameters:

sample – Compute sample statistics

synapse.ml.stages.TextPreprocessor module

class synapse.ml.stages.TextPreprocessor.TextPreprocessor(java_obj=None, inputCol=None, map=None, normFunc=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • inputCol (str) – The name of the input column

  • map (dict) – Map of substring match to replacement

  • normFunc (str) – Name of normalization function to apply

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getMap()[source]
Returns:

Map of substring match to replacement

Return type:

map

getNormFunc()[source]
Returns:

Name of normalization function to apply

Return type:

normFunc

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
map = Param(parent='undefined', name='map', doc='Map of substring match to replacement')
normFunc = Param(parent='undefined', name='normFunc', doc='Name of normalization function to apply')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setMap(value)[source]
Parameters:

map – Map of substring match to replacement

setNormFunc(value)[source]
Parameters:

normFunc – Name of normalization function to apply

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, map=None, normFunc=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.stages.TimeIntervalMiniBatchTransformer module

class synapse.ml.stages.TimeIntervalMiniBatchTransformer.TimeIntervalMiniBatchTransformer(java_obj=None, maxBatchSize=2147483647, millisToWait=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • maxBatchSize (int) – The max size of the buffer

  • millisToWait (int) – The time to wait before constructing a batch

static getJavaPackage()[source]

Returns package name String.

getMaxBatchSize()[source]
Returns:

The max size of the buffer

Return type:

maxBatchSize

getMillisToWait()[source]
Returns:

The time to wait before constructing a batch

Return type:

millisToWait

maxBatchSize = Param(parent='undefined', name='maxBatchSize', doc='The max size of the buffer')
millisToWait = Param(parent='undefined', name='millisToWait', doc='The time to wait before constructing a batch')
classmethod read()[source]

Returns an MLReader instance for this class.

setMaxBatchSize(value)[source]
Parameters:

maxBatchSize – The max size of the buffer

setMillisToWait(value)[source]
Parameters:

millisToWait – The time to wait before constructing a batch

setParams(maxBatchSize=2147483647, millisToWait=None)[source]

Set the (keyword only) parameters

synapse.ml.stages.Timer module

class synapse.ml.stages.Timer.Timer(java_obj=None, disableMaterialization=True, logToScala=True, stage=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation)

  • logToScala (bool) – Whether to output the time to the scala console

  • stage (object) – The stage to time

disableMaterialization = Param(parent='undefined', name='disableMaterialization', doc='Whether to disable timing (so that one can turn it off for evaluation)')
getDisableMaterialization()[source]
Returns:

Whether to disable timing (so that one can turn it off for evaluation)

Return type:

disableMaterialization

static getJavaPackage()[source]

Returns package name String.

getLogToScala()[source]
Returns:

Whether to output the time to the scala console

Return type:

logToScala

getStage()[source]
Returns:

The stage to time

Return type:

stage

logToScala = Param(parent='undefined', name='logToScala', doc='Whether to output the time to the scala console')
classmethod read()[source]

Returns an MLReader instance for this class.

setDisableMaterialization(value)[source]
Parameters:

disableMaterialization – Whether to disable timing (so that one can turn it off for evaluation)

setLogToScala(value)[source]
Parameters:

logToScala – Whether to output the time to the scala console

setParams(disableMaterialization=True, logToScala=True, stage=None)[source]

Set the (keyword only) parameters

setStage(value)[source]
Parameters:

stage – The stage to time

stage = Param(parent='undefined', name='stage', doc='The stage to time')

synapse.ml.stages.TimerModel module

class synapse.ml.stages.TimerModel.TimerModel(java_obj=None, disableMaterialization=True, logToScala=True, stage=None, transformer=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • disableMaterialization (bool) – Whether to disable timing (so that one can turn it off for evaluation)

  • logToScala (bool) – Whether to output the time to the scala console

  • stage (object) – The stage to time

  • transformer (object) – inner model to time

disableMaterialization = Param(parent='undefined', name='disableMaterialization', doc='Whether to disable timing (so that one can turn it off for evaluation)')
getDisableMaterialization()[source]
Returns:

Whether to disable timing (so that one can turn it off for evaluation)

Return type:

disableMaterialization

static getJavaPackage()[source]

Returns package name String.

getLogToScala()[source]
Returns:

Whether to output the time to the scala console

Return type:

logToScala

getStage()[source]
Returns:

The stage to time

Return type:

stage

getTransformer()[source]
Returns:

inner model to time

Return type:

transformer

logToScala = Param(parent='undefined', name='logToScala', doc='Whether to output the time to the scala console')
classmethod read()[source]

Returns an MLReader instance for this class.

setDisableMaterialization(value)[source]
Parameters:

disableMaterialization – Whether to disable timing (so that one can turn it off for evaluation)

setLogToScala(value)[source]
Parameters:

logToScala – Whether to output the time to the scala console

setParams(disableMaterialization=True, logToScala=True, stage=None, transformer=None)[source]

Set the (keyword only) parameters

setStage(value)[source]
Parameters:

stage – The stage to time

setTransformer(value)[source]
Parameters:

transformer – inner model to time

stage = Param(parent='undefined', name='stage', doc='The stage to time')
transformer = Param(parent='undefined', name='transformer', doc='inner model to time')

synapse.ml.stages.UDFTransformer module

class synapse.ml.stages.UDFTransformer.UDFTransformer(inputCol=None, inputCols=None, outputCol=None, udf=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • inputCol (str) – The name of the input column (default: )

  • outputCol (str) – The name of the output column

  • udf (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]
Returns:

The name of the input column (default: )

Return type:

str

getInputCols()[source]
Returns:

The name of the input column (default: )

Return type:

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

str

getUDF()[source]
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol (str) – The name of the input column (default: )

setInputCols(value)[source]
Parameters:

inputCols (list) – The names of the input columns (default: )

setOutputCol(value)[source]
Parameters:

outputCol (str) – The name of the output column

setUDF(udf)[source]

synapse.ml.stages.UnicodeNormalize module

class synapse.ml.stages.UnicodeNormalize.UnicodeNormalize(java_obj=None, form=None, inputCol=None, lower=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • form (str) – Unicode normalization form: NFC, NFD, NFKC, NFKD

  • inputCol (str) – The name of the input column

  • lower (bool) – Lowercase text

  • outputCol (str) – The name of the output column

form = Param(parent='undefined', name='form', doc='Unicode normalization form: NFC, NFD, NFKC, NFKD')
getForm()[source]
Returns:

Unicode normalization form: NFC, NFD, NFKC, NFKD

Return type:

form

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getLower()[source]
Returns:

Lowercase text

Return type:

lower

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
lower = Param(parent='undefined', name='lower', doc='Lowercase text')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setForm(value)[source]
Parameters:

form – Unicode normalization form: NFC, NFD, NFKC, NFKD

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setLower(value)[source]
Parameters:

lower – Lowercase text

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(form=None, inputCol=None, lower=None, outputCol=None)[source]

Set the (keyword only) parameters

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.