synapse.ml.cognitive.anomaly package

Submodules

synapse.ml.cognitive.anomaly.DetectAnomalies module

class synapse.ml.cognitive.anomaly.DetectAnomalies.DetectAnomalies(java_obj=None, AADToken=None, AADTokenCol=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_fd7a79490b10_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_fd7a79490b10_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • imputeFixedValue (object) – Optional argument, fixed value to use when imputeMode is set to “fixed”

  • imputeMode (object) – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc='ServiceParam:  Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImputeFixedValue()[source]
Returns

Optional argument, fixed value to use when imputeMode is set to “fixed”

Return type

imputeFixedValue

getImputeMode()[source]
Returns

Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

Return type

imputeMode

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc='ServiceParam:  Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imputeFixedValue = Param(parent='undefined', name='imputeFixedValue', doc='ServiceParam:  Optional argument, fixed value to use when imputeMode is set to "fixed"     ')
imputeMode = Param(parent='undefined', name='imputeMode', doc='ServiceParam:  Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill     ')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc='ServiceParam:  Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc='ServiceParam:  Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc='ServiceParam:  Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc='ServiceParam:  Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImputeFixedValue(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeFixedValueCol(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeMode(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setImputeModeCol(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_fd7a79490b10_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_fd7a79490b10_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.anomaly.DetectLastAnomaly module

class synapse.ml.cognitive.anomaly.DetectLastAnomaly.DetectLastAnomaly(java_obj=None, AADToken=None, AADTokenCol=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_b9f74647a778_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_b9f74647a778_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • imputeFixedValue (object) – Optional argument, fixed value to use when imputeMode is set to “fixed”

  • imputeMode (object) – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc='ServiceParam:  Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImputeFixedValue()[source]
Returns

Optional argument, fixed value to use when imputeMode is set to “fixed”

Return type

imputeFixedValue

getImputeMode()[source]
Returns

Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

Return type

imputeMode

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc='ServiceParam:  Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imputeFixedValue = Param(parent='undefined', name='imputeFixedValue', doc='ServiceParam:  Optional argument, fixed value to use when imputeMode is set to "fixed"     ')
imputeMode = Param(parent='undefined', name='imputeMode', doc='ServiceParam:  Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill     ')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc='ServiceParam:  Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc='ServiceParam:  Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc='ServiceParam:  Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc='ServiceParam:  Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImputeFixedValue(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeFixedValueCol(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeMode(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setImputeModeCol(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_b9f74647a778_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_b9f74647a778_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.anomaly.DetectLastMultivariateAnomaly module

class synapse.ml.cognitive.anomaly.DetectLastMultivariateAnomaly.DetectLastMultivariateAnomaly(java_obj=None, AADToken=None, AADTokenCol=None, batchSize=300, concurrency=1, concurrentTimeout=None, diagnosticsInfo=None, errorCol='DetectLastMultivariateAnomaly_949acc5bc949_error', handler=None, inputVariablesCols=None, modelId=None, outputCol='DetectLastMultivariateAnomaly_949acc5bc949_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', topContributorCount=10, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • diagnosticsInfo (object) – diagnosticsInfo for training a multivariate anomaly detection model

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • inputVariablesCols (list) – The names of the input variables columns

  • modelId (str) – Format - uuid. Model identifier.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • timestampCol (str) – Timestamp column name

  • topContributorCount (int) – This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
diagnosticsInfo = Param(parent='undefined', name='diagnosticsInfo', doc='diagnosticsInfo for training a multivariate anomaly detection model')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDiagnosticsInfo()[source]
Returns

diagnosticsInfo for training a multivariate anomaly detection model

Return type

diagnosticsInfo

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getInputVariablesCols()[source]
Returns

The names of the input variables columns

Return type

inputVariablesCols

static getJavaPackage()[source]

Returns package name String.

getModelId()[source]
Returns

Format - uuid. Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getTimestampCol()[source]
Returns

Timestamp column name

Return type

timestampCol

getTopContributorCount()[source]
Returns

This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.

Return type

topContributorCount

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
inputVariablesCols = Param(parent='undefined', name='inputVariablesCols', doc='The names of the input variables columns')
modelId = Param(parent='undefined', name='modelId', doc='Format - uuid. Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDiagnosticsInfo(value)[source]
Parameters

diagnosticsInfo – diagnosticsInfo for training a multivariate anomaly detection model

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setInputVariablesCols(value)[source]
Parameters

inputVariablesCols – The names of the input variables columns

setLocation(value)[source]
setModelId(value)[source]
Parameters

modelId – Format - uuid. Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, batchSize=300, concurrency=1, concurrentTimeout=None, diagnosticsInfo=None, errorCol='DetectLastMultivariateAnomaly_949acc5bc949_error', handler=None, inputVariablesCols=None, modelId=None, outputCol='DetectLastMultivariateAnomaly_949acc5bc949_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', topContributorCount=10, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setTimestampCol(value)[source]
Parameters

timestampCol – Timestamp column name

setTopContributorCount(value)[source]
Parameters

topContributorCount – This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
timestampCol = Param(parent='undefined', name='timestampCol', doc='Timestamp column name')
topContributorCount = Param(parent='undefined', name='topContributorCount', doc='This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.anomaly.SimpleDetectAnomalies module

class synapse.ml.cognitive.anomaly.SimpleDetectAnomalies.SimpleDetectAnomalies(java_obj=None, AADToken=None, AADTokenCol=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='SimpleDetectAnomalies_840547f19b95_error', granularity=None, granularityCol=None, groupbyCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='SimpleDetectAnomalies_840547f19b95_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • groupbyCol (str) – column that groups the series

  • handler (object) – Which strategy to use when handling requests

  • imputeFixedValue (object) – Optional argument, fixed value to use when imputeMode is set to “fixed”

  • imputeMode (object) – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • timestampCol (str) – column representing the time of the series

  • url (str) – Url of the service

  • valueCol (str) – column representing the value of the series

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc='ServiceParam:  Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getGroupbyCol()[source]
Returns

column that groups the series

Return type

groupbyCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImputeFixedValue()[source]
Returns

Optional argument, fixed value to use when imputeMode is set to “fixed”

Return type

imputeFixedValue

getImputeMode()[source]
Returns

Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

Return type

imputeMode

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getTimestampCol()[source]
Returns

column representing the time of the series

Return type

timestampCol

getUrl()[source]
Returns

Url of the service

Return type

url

getValueCol()[source]
Returns

column representing the value of the series

Return type

valueCol

granularity = Param(parent='undefined', name='granularity', doc='ServiceParam:  Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
groupbyCol = Param(parent='undefined', name='groupbyCol', doc='column that groups the series')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imputeFixedValue = Param(parent='undefined', name='imputeFixedValue', doc='ServiceParam:  Optional argument, fixed value to use when imputeMode is set to "fixed"     ')
imputeMode = Param(parent='undefined', name='imputeMode', doc='ServiceParam:  Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill     ')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc='ServiceParam:  Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc='ServiceParam:  Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc='ServiceParam:  Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc='ServiceParam:  Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGroupbyCol(value)[source]
Parameters

groupbyCol – column that groups the series

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImputeFixedValue(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeFixedValueCol(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeMode(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setImputeModeCol(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='SimpleDetectAnomalies_840547f19b95_error', granularity=None, granularityCol=None, groupbyCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='SimpleDetectAnomalies_840547f19b95_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setTimestampCol(value)[source]
Parameters

timestampCol – column representing the time of the series

setUrl(value)[source]
Parameters

url – Url of the service

setValueCol(value)[source]
Parameters

valueCol – column representing the value of the series

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
timestampCol = Param(parent='undefined', name='timestampCol', doc='column representing the time of the series')
url = Param(parent='undefined', name='url', doc='Url of the service')
valueCol = Param(parent='undefined', name='valueCol', doc='column representing the value of the series')

synapse.ml.cognitive.anomaly.SimpleDetectMultivariateAnomaly module

class synapse.ml.cognitive.anomaly.SimpleDetectMultivariateAnomaly.SimpleDetectMultivariateAnomaly(java_obj=None, backoffs=[100, 500, 1000], diagnosticsInfo=None, endTime=None, errorCol='SimpleDetectMultivariateAnomaly_d9d3c057b1d0_error', handler=None, initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, modelId=None, outputCol='SimpleDetectMultivariateAnomaly_d9d3c057b1d0_output', pollingDelay=300, startTime=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timestampCol='timestamp', topContributorCount=10, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • diagnosticsInfo (object) – diagnosticsInfo for training a multivariate anomaly detection model

  • endTime (str) – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • inputCols (list) – The names of the input columns

  • intermediateSaveDir (str) – Blob storage location in HDFS where intermediate data is saved while training.

  • maxPollingRetries (int) – number of times to poll

  • modelId (str) – Format - uuid. Model identifier.

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • startTime (str) – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timestampCol (str) – Timestamp column name

  • topContributorCount (int) – This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
cleanUpIntermediateData()[source]
diagnosticsInfo = Param(parent='undefined', name='diagnosticsInfo', doc='diagnosticsInfo for training a multivariate anomaly detection model')
endTime = Param(parent='undefined', name='endTime', doc='A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getDiagnosticsInfo()[source]
Returns

diagnosticsInfo for training a multivariate anomaly detection model

Return type

diagnosticsInfo

getEndTime()[source]
Returns

A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

endTime

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

getInputCols()[source]
Returns

The names of the input columns

Return type

inputCols

getIntermediateSaveDir()[source]
Returns

Blob storage location in HDFS where intermediate data is saved while training.

Return type

intermediateSaveDir

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelId()[source]
Returns

Format - uuid. Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getStartTime()[source]
Returns

A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

startTime

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesException

getTimestampCol()[source]
Returns

Timestamp column name

Return type

timestampCol

getTopContributorCount()[source]
Returns

This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.

Return type

topContributorCount

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
intermediateSaveDir = Param(parent='undefined', name='intermediateSaveDir', doc='Blob storage location in HDFS where intermediate data is saved while training.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelId = Param(parent='undefined', name='modelId', doc='Format - uuid. Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setDiagnosticsInfo(value)[source]
Parameters

diagnosticsInfo – diagnosticsInfo for training a multivariate anomaly detection model

setEndTime(value)[source]
Parameters

endTime – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setInputCols(value)[source]
Parameters

inputCols – The names of the input columns

setIntermediateSaveDir(value)[source]
Parameters

intermediateSaveDir – Blob storage location in HDFS where intermediate data is saved while training.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelId(value)[source]
Parameters

modelId – Format - uuid. Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], diagnosticsInfo=None, endTime=None, errorCol='SimpleDetectMultivariateAnomaly_d9d3c057b1d0_error', handler=None, initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, modelId=None, outputCol='SimpleDetectMultivariateAnomaly_d9d3c057b1d0_output', pollingDelay=300, startTime=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timestampCol='timestamp', topContributorCount=10, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setStartTime(value)[source]
Parameters

startTime – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimestampCol(value)[source]
Parameters

timestampCol – Timestamp column name

setTopContributorCount(value)[source]
Parameters

topContributorCount – This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.

setUrl(value)[source]
Parameters

url – Url of the service

startTime = Param(parent='undefined', name='startTime', doc='A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timestampCol = Param(parent='undefined', name='timestampCol', doc='Timestamp column name')
topContributorCount = Param(parent='undefined', name='topContributorCount', doc='This is a number that you could specify N from 1 to 30, which will give you the details of top N contributed variables in the anomaly results. For example, if you have 100 variables in the model, but you only care the top five contributed variables in detection results, then you should fill this field with 5. The default number is 10.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.anomaly.SimpleFitMultivariateAnomaly module

class synapse.ml.cognitive.anomaly.SimpleFitMultivariateAnomaly.SimpleFitMultivariateAnomaly(java_obj=None, alignMode='Outer', backoffs=[100, 500, 1000], displayName=None, endTime=None, errorCol='SimpleFitMultivariateAnomaly_a00be78ec465_error', fillNAMethod='Linear', initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, outputCol='SimpleFitMultivariateAnomaly_a00be78ec465_output', paddingValue=None, pollingDelay=300, slidingWindow=300, startTime=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timestampCol='timestamp', url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • alignMode (str) – An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}

  • backoffs (list) – array of backoffs to use in the handler

  • displayName (str) – optional field, name of the model

  • endTime (str) – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • errorCol (str) – column to hold http errors

  • fillNAMethod (str) – An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • inputCols (list) – The names of the input columns

  • intermediateSaveDir (str) – Blob storage location in HDFS where intermediate data is saved while training.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • paddingValue (int) – optional field, is only useful if FillNAMethod is set to Fixed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • slidingWindow (int) – An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.

  • startTime (str) – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timestampCol (str) – Timestamp column name

  • url (str) – Url of the service

alignMode = Param(parent='undefined', name='alignMode', doc='An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
cleanUpIntermediateData()[source]
displayName = Param(parent='undefined', name='displayName', doc='optional field, name of the model')
endTime = Param(parent='undefined', name='endTime', doc='A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fillNAMethod = Param(parent='undefined', name='fillNAMethod', doc='An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}')
getAlignMode()[source]
Returns

An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}

Return type

alignMode

getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getDisplayName()[source]
Returns

optional field, name of the model

Return type

displayName

getEndTime()[source]
Returns

A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

endTime

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFillNAMethod()[source]
Returns

An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}

Return type

fillNAMethod

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

getInputCols()[source]
Returns

The names of the input columns

Return type

inputCols

getIntermediateSaveDir()[source]
Returns

Blob storage location in HDFS where intermediate data is saved while training.

Return type

intermediateSaveDir

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPaddingValue()[source]
Returns

optional field, is only useful if FillNAMethod is set to Fixed.

Return type

paddingValue

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSlidingWindow()[source]
Returns

An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.

Return type

slidingWindow

getStartTime()[source]
Returns

A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

startTime

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesException

getTimestampCol()[source]
Returns

Timestamp column name

Return type

timestampCol

getUrl()[source]
Returns

Url of the service

Return type

url

initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
intermediateSaveDir = Param(parent='undefined', name='intermediateSaveDir', doc='Blob storage location in HDFS where intermediate data is saved while training.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
paddingValue = Param(parent='undefined', name='paddingValue', doc='optional field, is only useful if FillNAMethod is set to Fixed.')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setAlignMode(value)[source]
Parameters

alignMode – An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setDisplayName(value)[source]
Parameters

displayName – optional field, name of the model

setEndTime(value)[source]
Parameters

endTime – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFillNAMethod(value)[source]
Parameters

fillNAMethod – An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setInputCols(value)[source]
Parameters

inputCols – The names of the input columns

setIntermediateSaveDir(value)[source]
Parameters

intermediateSaveDir – Blob storage location in HDFS where intermediate data is saved while training.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPaddingValue(value)[source]
Parameters

paddingValue – optional field, is only useful if FillNAMethod is set to Fixed.

setParams(alignMode='Outer', backoffs=[100, 500, 1000], displayName=None, endTime=None, errorCol='SimpleFitMultivariateAnomaly_a00be78ec465_error', fillNAMethod='Linear', initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, outputCol='SimpleFitMultivariateAnomaly_a00be78ec465_output', paddingValue=None, pollingDelay=300, slidingWindow=300, startTime=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timestampCol='timestamp', url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSlidingWindow(value)[source]
Parameters

slidingWindow – An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.

setStartTime(value)[source]
Parameters

startTime – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimestampCol(value)[source]
Parameters

timestampCol – Timestamp column name

setUrl(value)[source]
Parameters

url – Url of the service

slidingWindow = Param(parent='undefined', name='slidingWindow', doc='An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.')
startTime = Param(parent='undefined', name='startTime', doc='A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timestampCol = Param(parent='undefined', name='timestampCol', doc='Timestamp column name')
url = Param(parent='undefined', name='url', doc='Url of the service')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.