synapse.ml.cognitive package

Submodules

synapse.ml.cognitive.AddDocuments module

class synapse.ml.cognitive.AddDocuments.AddDocuments(java_obj=None, actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_faf4248b34f2_error', handler=None, indexName=None, outputCol='AddDocuments_faf4248b34f2_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • actionCol (object) – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • indexName (object) –

  • outputCol (object) – The name of the output column

  • serviceName (object) –

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

actionCol = Param(parent='undefined', name='actionCol', doc=" You can combine actions, such as an upload and a delete, in the same batch.  upload: An upload action is similar to an 'upsert' where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case.  merge: Merge updates an existing document with the specified fields. If the document doesn't exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field 'tags' with value ['budget'] and you execute a merge with value ['economy', 'pool'] for 'tags', the final value of the 'tags' field will be ['economy', 'pool'].  It will not be ['budget', 'economy', 'pool'].  mergeOrUpload: This action behaves like merge if a document  with the given key already exists in the index.  If the document does not exist, it behaves like upload with a new document.  delete: Delete removes the specified document from the index.  Note that any field you specify in a delete operation,  other than the key field, will be ignored. If you want to   remove an individual field from a document, use merge   instead and simply set the field explicitly to null.     ")
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getActionCol()[source]
Returns

You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

Return type

actionCol

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIndexName()[source]
Returns

Return type

indexName

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getServiceName()[source]
Returns

Return type

serviceName

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
indexName = Param(parent='undefined', name='indexName', doc='')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

serviceName = Param(parent='undefined', name='serviceName', doc='')
setActionCol(value)[source]
Parameters

actionCol – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIndexName(value)[source]
Parameters

indexName

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_faf4248b34f2_error', handler=None, indexName=None, outputCol='AddDocuments_faf4248b34f2_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setServiceName(value)[source]
Parameters

serviceName

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeBusinessCards module

class synapse.ml.cognitive.AnalyzeBusinessCards.AnalyzeBusinessCards(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_dbb68d08bd03_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeBusinessCards_dbb68d08bd03_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_dbb68d08bd03_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeBusinessCards_dbb68d08bd03_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeCustomModel module

class synapse.ml.cognitive.AnalyzeCustomModel.AnalyzeCustomModel(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_0840f7d2f4b7_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelId=None, modelIdCol=None, outputCol='AnalyzeCustomModel_0840f7d2f4b7_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • modelId (object) – Model identifier.

  • outputCol (object) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelId()[source]
Returns

Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelId = Param(parent='undefined', name='modelId', doc='Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelId(value)[source]
Parameters

modelId – Model identifier.

setModelIdCol(value)[source]
Parameters

modelId – Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_0840f7d2f4b7_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelId=None, modelIdCol=None, outputCol='AnalyzeCustomModel_0840f7d2f4b7_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeIDDocuments module

class synapse.ml.cognitive.AnalyzeIDDocuments.AnalyzeIDDocuments(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_765efbdad520_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, outputCol='AnalyzeIDDocuments_765efbdad520_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_765efbdad520_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, outputCol='AnalyzeIDDocuments_765efbdad520_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeImage module

class synapse.ml.cognitive.AnalyzeImage.AnalyzeImage(java_obj=None, concurrency=1, concurrentTimeout=None, details=None, detailsCol=None, errorCol='AnalyzeImage_cd104f6f64c2_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='AnalyzeImage_cd104f6f64c2_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, visualFeatures=None, visualFeaturesCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • details (object) – what visual feature types to return

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language of the response (en if none given)

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

  • visualFeatures (object) – what visual feature types to return

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
details = Param(parent='undefined', name='details', doc='what visual feature types to return')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDetails()[source]
Returns

what visual feature types to return

Return type

details

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language of the response (en if none given)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

getVisualFeatures()[source]
Returns

what visual feature types to return

Return type

visualFeatures

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='the language of the response (en if none given)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDetails(value)[source]
Parameters

details – what visual feature types to return

setDetailsCol(value)[source]
Parameters

details – what visual feature types to return

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – the language of the response (en if none given)

setLanguageCol(value)[source]
Parameters

language – the language of the response (en if none given)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, details=None, detailsCol=None, errorCol='AnalyzeImage_cd104f6f64c2_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='AnalyzeImage_cd104f6f64c2_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, visualFeatures=None, visualFeaturesCol=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

setVisualFeatures(value)[source]
Parameters

visualFeatures – what visual feature types to return

setVisualFeaturesCol(value)[source]
Parameters

visualFeatures – what visual feature types to return

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')
visualFeatures = Param(parent='undefined', name='visualFeatures', doc='what visual feature types to return')

synapse.ml.cognitive.AnalyzeInvoices module

class synapse.ml.cognitive.AnalyzeInvoices.AnalyzeInvoices(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_2f671a8b4f08_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeInvoices_2f671a8b4f08_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_2f671a8b4f08_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeInvoices_2f671a8b4f08_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeLayout module

class synapse.ml.cognitive.AnalyzeLayout.AnalyzeLayout(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_2ae0e469a27e_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, outputCol='AnalyzeLayout_2ae0e469a27e_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • language (object) – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • readingOrder (object) – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getReadingOrder()[source]
Returns

Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

Return type

readingOrder

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
language = Param(parent='undefined', name='language', doc='The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

readingOrder = Param(parent='undefined', name='readingOrder', doc="Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either 'basic' or 'natural'. Will default to basic if not specified")
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLanguage(value)[source]
Parameters

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLanguageCol(value)[source]
Parameters

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_2ae0e469a27e_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, outputCol='AnalyzeLayout_2ae0e469a27e_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setReadingOrder(value)[source]
Parameters

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setReadingOrderCol(value)[source]
Parameters

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeReceipts module

class synapse.ml.cognitive.AnalyzeReceipts.AnalyzeReceipts(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_0b40c30d2130_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeReceipts_0b40c30d2130_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_0b40c30d2130_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeReceipts_0b40c30d2130_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AzureSearchWriter module

synapse.ml.cognitive.AzureSearchWriter.streamToAzureSearch(df, **options)[source]
synapse.ml.cognitive.AzureSearchWriter.writeToAzureSearch(df, **options)[source]

synapse.ml.cognitive.BingImageSearch module

class synapse.ml.cognitive.BingImageSearch.BingImageSearch(java_obj=None, aspect=None, aspectCol=None, color=None, colorCol=None, concurrency=1, concurrentTimeout=None, count=None, countCol=None, errorCol='BingImageSearch_f7bcb4d81022_error', freshness=None, freshnessCol=None, handler=None, height=None, heightCol=None, imageContent=None, imageContentCol=None, imageType=None, imageTypeCol=None, license=None, licenseCol=None, maxFileSize=None, maxFileSizeCol=None, maxHeight=None, maxHeightCol=None, maxWidth=None, maxWidthCol=None, minFileSize=None, minFileSizeCol=None, minHeight=None, minHeightCol=None, minWidth=None, minWidthCol=None, mkt=None, mktCol=None, offset=None, offsetCol=None, outputCol='BingImageSearch_f7bcb4d81022_output', q=None, qCol=None, size=None, sizeCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url='https://api.bing.microsoft.com/v7.0/images/search', width=None, widthCol=None)[source]

Bases: synapse.ml.cognitive._BingImageSearch._BingImageSearch

static downloadFromUrls(pathCol, bytesCol, concurrency, timeout)[source]
static getUrlTransformer(imageCol, urlCol)[source]
setMarket(value)[source]
setMarketCol(value)[source]
setQuery(value)[source]
setQueryCol(value)[source]

synapse.ml.cognitive.BreakSentence module

class synapse.ml.cognitive.BreakSentence.BreakSentence(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='BreakSentence_1e59df67cf51_error', handler=None, language=None, languageCol=None, outputCol='BreakSentence_1e59df67cf51_output', script=None, scriptCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

  • outputCol (object) – The name of the output column

  • script (object) – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getScript()[source]
Returns

Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

Return type

script

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

script = Param(parent='undefined', name='script', doc='Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLanguageCol(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='BreakSentence_1e59df67cf51_error', handler=None, language=None, languageCol=None, outputCol='BreakSentence_1e59df67cf51_output', script=None, scriptCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setScript(value)[source]
Parameters

script – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

setScriptCol(value)[source]
Parameters

script – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
text = Param(parent='undefined', name='text', doc='the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.ConversationTranscription module

class synapse.ml.cognitive.ConversationTranscription.ConversationTranscription(java_obj=None, audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioDataCol (object) – Column holding audio data, must be either ByteArrays or Strings representing file URIs

  • endpointId (object) – endpoint for custom speech models

  • extraFfmpegArgs (list) – extra arguments to for ffmpeg output decoding

  • fileType (object) – The file type of the sound files, supported types: wav, ogg, mp3

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (object) – The name of the output column

  • participantsJson (object) – a json representation of a list of conversation participants (email, language, user)

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • recordAudioData (bool) – Whether to record audio data to a file location, for use only with m3u8 streams

  • recordedFileNameCol (object) – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

  • streamIntermediateResults (bool) – Whether or not to immediately return itermediate results, or group in a sequence

  • subscriptionKey (object) – the API key to use

  • url (object) – Url of the service

audioDataCol = Param(parent='undefined', name='audioDataCol', doc='Column holding audio data, must be either ByteArrays or Strings representing file URIs')
endpointId = Param(parent='undefined', name='endpointId', doc='endpoint for custom speech models')
extraFfmpegArgs = Param(parent='undefined', name='extraFfmpegArgs', doc='extra arguments to for ffmpeg output decoding')
fileType = Param(parent='undefined', name='fileType', doc='The file type of the sound files, supported types: wav, ogg, mp3')
format = Param(parent='undefined', name='format', doc=' Specifies the result format. Accepted values are simple and detailed. Default is simple.     ')
getAudioDataCol()[source]
Returns

Column holding audio data, must be either ByteArrays or Strings representing file URIs

Return type

audioDataCol

getEndpointId()[source]
Returns

endpoint for custom speech models

Return type

endpointId

getExtraFfmpegArgs()[source]
Returns

extra arguments to for ffmpeg output decoding

Return type

extraFfmpegArgs

getFileType()[source]
Returns

The file type of the sound files, supported types: wav, ogg, mp3

Return type

fileType

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

format

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getParticipantsJson()[source]
Returns

a json representation of a list of conversation participants (email, language, user)

Return type

participantsJson

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

profanity

getRecordAudioData()[source]
Returns

Whether to record audio data to a file location, for use only with m3u8 streams

Return type

recordAudioData

getRecordedFileNameCol()[source]
Returns

Column holding file names to write audio data to if ``recordAudioData’’ is set to true

Return type

recordedFileNameCol

getStreamIntermediateResults()[source]
Returns

Whether or not to immediately return itermediate results, or group in a sequence

Return type

streamIntermediateResults

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getUrl()[source]
Returns

Url of the service

Return type

url

language = Param(parent='undefined', name='language', doc=' Identifies the spoken language that is being recognized.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
participantsJson = Param(parent='undefined', name='participantsJson', doc='a json representation of a list of conversation participants (email, language, user)')
profanity = Param(parent='undefined', name='profanity', doc=' Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

recordAudioData = Param(parent='undefined', name='recordAudioData', doc='Whether to record audio data to a file location, for use only with m3u8 streams')
recordedFileNameCol = Param(parent='undefined', name='recordedFileNameCol', doc="Column holding file names to write audio data to if ``recordAudioData'' is set to true")
setAudioDataCol(value)[source]
Parameters

audioDataCol – Column holding audio data, must be either ByteArrays or Strings representing file URIs

setEndpointId(value)[source]
Parameters

endpointId – endpoint for custom speech models

setExtraFfmpegArgs(value)[source]
Parameters

extraFfmpegArgs – extra arguments to for ffmpeg output decoding

setFileType(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFileTypeCol(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFormat(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setLanguage(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Set the (keyword only) parameters

setParticipantsJson(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setParticipantsJsonCol(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setProfanity(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setRecordAudioData(value)[source]
Parameters

recordAudioData – Whether to record audio data to a file location, for use only with m3u8 streams

setRecordedFileNameCol(value)[source]
Parameters

recordedFileNameCol – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

setStreamIntermediateResults(value)[source]
Parameters

streamIntermediateResults – Whether or not to immediately return itermediate results, or group in a sequence

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setUrl(value)[source]
Parameters

url – Url of the service

streamIntermediateResults = Param(parent='undefined', name='streamIntermediateResults', doc='Whether or not to immediately return itermediate results, or group in a sequence')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DescribeImage module

class synapse.ml.cognitive.DescribeImage.DescribeImage(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DescribeImage_174a70434c5b_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxCandidates=None, maxCandidatesCol=None, outputCol='DescribeImage_174a70434c5b_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – Language of image description

  • maxCandidates (object) – Maximum candidate descriptions to return

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language of image description

Return type

language

getMaxCandidates()[source]
Returns

Maximum candidate descriptions to return

Return type

maxCandidates

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='Language of image description')
maxCandidates = Param(parent='undefined', name='maxCandidates', doc='Maximum candidate descriptions to return')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – Language of image description

setLanguageCol(value)[source]
Parameters

language – Language of image description

setLinkedService(value)[source]
setLocation(value)[source]
setMaxCandidates(value)[source]
Parameters

maxCandidates – Maximum candidate descriptions to return

setMaxCandidatesCol(value)[source]
Parameters

maxCandidates – Maximum candidate descriptions to return

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DescribeImage_174a70434c5b_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxCandidates=None, maxCandidatesCol=None, outputCol='DescribeImage_174a70434c5b_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.Detect module

class synapse.ml.cognitive.Detect.Detect(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='Detect_68bac379051c_error', handler=None, outputCol='Detect_68bac379051c_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='Detect_68bac379051c_error', handler=None, outputCol='Detect_68bac379051c_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
text = Param(parent='undefined', name='text', doc='the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectAnomalies module

class synapse.ml.cognitive.DetectAnomalies.DetectAnomalies(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_e575b2b0b1ed_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_e575b2b0b1ed_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (object) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (object) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc=' Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc=' Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc=' Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc=' Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc=' Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc=' Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_e575b2b0b1ed_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_e575b2b0b1ed_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectFace module

class synapse.ml.cognitive.DetectFace.DetectFace(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DetectFace_6081040d1d68_error', handler=None, imageUrl=None, imageUrlCol=None, outputCol='DetectFace_6081040d1d68_output', returnFaceAttributes=None, returnFaceAttributesCol=None, returnFaceId=None, returnFaceIdCol=None, returnFaceLandmarks=None, returnFaceLandmarksCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageUrl (object) – the url of the image to use

  • outputCol (object) – The name of the output column

  • returnFaceAttributes (object) – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

  • returnFaceId (object) – Return faceIds of the detected faces or not. The default value is true

  • returnFaceLandmarks (object) – Return face landmarks of the detected faces or not. The default value is false.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getReturnFaceAttributes()[source]
Returns

Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

Return type

returnFaceAttributes

getReturnFaceId()[source]
Returns

Return faceIds of the detected faces or not. The default value is true

Return type

returnFaceId

getReturnFaceLandmarks()[source]
Returns

Return face landmarks of the detected faces or not. The default value is false.

Return type

returnFaceLandmarks

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

returnFaceAttributes = Param(parent='undefined', name='returnFaceAttributes', doc='Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.')
returnFaceId = Param(parent='undefined', name='returnFaceId', doc='Return faceIds of the detected faces or not. The default value is true')
returnFaceLandmarks = Param(parent='undefined', name='returnFaceLandmarks', doc='Return face landmarks of the detected faces or not. The default value is false.')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DetectFace_6081040d1d68_error', handler=None, imageUrl=None, imageUrlCol=None, outputCol='DetectFace_6081040d1d68_output', returnFaceAttributes=None, returnFaceAttributesCol=None, returnFaceId=None, returnFaceIdCol=None, returnFaceLandmarks=None, returnFaceLandmarksCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setReturnFaceAttributes(value)[source]
Parameters

returnFaceAttributes – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceAttributesCol(value)[source]
Parameters

returnFaceAttributes – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceId(value)[source]
Parameters

returnFaceId – Return faceIds of the detected faces or not. The default value is true

setReturnFaceIdCol(value)[source]
Parameters

returnFaceId – Return faceIds of the detected faces or not. The default value is true

setReturnFaceLandmarks(value)[source]
Parameters

returnFaceLandmarks – Return face landmarks of the detected faces or not. The default value is false.

setReturnFaceLandmarksCol(value)[source]
Parameters

returnFaceLandmarks – Return face landmarks of the detected faces or not. The default value is false.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectLastAnomaly module

class synapse.ml.cognitive.DetectLastAnomaly.DetectLastAnomaly(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_542b6c111057_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_542b6c111057_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (object) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (object) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc=' Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc=' Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc=' Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc=' Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc=' Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc=' Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_542b6c111057_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_542b6c111057_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectMultivariateAnomaly module

class synapse.ml.cognitive.DetectMultivariateAnomaly.DetectMultivariateAnomaly(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, connectionString=None, containerName=None, endTime=None, endpoint=None, errorCol='DetectMultivariateAnomaly_1ea38f78daa4_error', initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, modelId=None, outputCol='DetectMultivariateAnomaly_1ea38f78daa4_output', pollingDelay=300, sasToken=None, startTime=None, storageKey=None, storageName=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, timestampCol='timestamp', url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • connectionString (object) – Connection String for your storage account used for uploading files.

  • containerName (object) – Container that will be used to upload files to.

  • endTime (object) – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • endpoint (object) – End Point for your storage account used for uploading files.

  • errorCol (object) – column to hold http errors

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • inputCols (list) – The names of the input columns

  • intermediateSaveDir (object) – Directory name of which you want to save the intermediate data produced while training.

  • maxPollingRetries (int) – number of times to poll

  • modelId (object) – Format - uuid. Model identifier.

  • outputCol (object) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • sasToken (object) – SAS Token for your storage account used for uploading files.

  • startTime (object) – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • storageKey (object) – Storage Key for your storage account used for uploading files.

  • storageName (object) – Storage Name for your storage account used for uploading files.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • timestampCol (object) – Timestamp column name

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
cleanUpIntermediateData()[source]
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
connectionString = Param(parent='undefined', name='connectionString', doc='Connection String for your storage account used for uploading files.')
containerName = Param(parent='undefined', name='containerName', doc='Container that will be used to upload files to.')
endTime = Param(parent='undefined', name='endTime', doc='A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
endpoint = Param(parent='undefined', name='endpoint', doc='End Point for your storage account used for uploading files.')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getConnectionString()[source]
Returns

Connection String for your storage account used for uploading files.

Return type

connectionString

getContainerName()[source]
Returns

Container that will be used to upload files to.

Return type

containerName

getEndTime()[source]
Returns

A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

endTime

getEndpoint()[source]
Returns

End Point for your storage account used for uploading files.

Return type

endpoint

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

getInputCols()[source]
Returns

The names of the input columns

Return type

inputCols

getIntermediateSaveDir()[source]
Returns

Directory name of which you want to save the intermediate data produced while training.

Return type

intermediateSaveDir

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelId()[source]
Returns

Format - uuid. Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSasToken()[source]
Returns

SAS Token for your storage account used for uploading files.

Return type

sasToken

getStartTime()[source]
Returns

A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

startTime

getStorageKey()[source]
Returns

Storage Key for your storage account used for uploading files.

Return type

storageKey

getStorageName()[source]
Returns

Storage Name for your storage account used for uploading files.

Return type

storageName

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getTimestampCol()[source]
Returns

Timestamp column name

Return type

timestampCol

getUrl()[source]
Returns

Url of the service

Return type

url

initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
intermediateSaveDir = Param(parent='undefined', name='intermediateSaveDir', doc='Directory name of which you want to save the intermediate data produced while training.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelId = Param(parent='undefined', name='modelId', doc='Format - uuid. Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

sasToken = Param(parent='undefined', name='sasToken', doc='SAS Token for your storage account used for uploading files.')
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setConnectionString(value)[source]
Parameters

connectionString – Connection String for your storage account used for uploading files.

setContainerName(value)[source]
Parameters

containerName – Container that will be used to upload files to.

setEndTime(value)[source]
Parameters

endTime – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setEndpoint(value)[source]
Parameters

endpoint – End Point for your storage account used for uploading files.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setInputCols(value)[source]
Parameters

inputCols – The names of the input columns

setIntermediateSaveDir(value)[source]
Parameters

intermediateSaveDir – Directory name of which you want to save the intermediate data produced while training.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelId(value)[source]
Parameters

modelId – Format - uuid. Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, connectionString=None, containerName=None, endTime=None, endpoint=None, errorCol='DetectMultivariateAnomaly_1ea38f78daa4_error', initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, modelId=None, outputCol='DetectMultivariateAnomaly_1ea38f78daa4_output', pollingDelay=300, sasToken=None, startTime=None, storageKey=None, storageName=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, timestampCol='timestamp', url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSasToken(value)[source]
Parameters

sasToken – SAS Token for your storage account used for uploading files.

setStartTime(value)[source]
Parameters

startTime – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setStorageKey(value)[source]
Parameters

storageKey – Storage Key for your storage account used for uploading files.

setStorageName(value)[source]
Parameters

storageName – Storage Name for your storage account used for uploading files.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setTimestampCol(value)[source]
Parameters

timestampCol – Timestamp column name

setUrl(value)[source]
Parameters

url – Url of the service

startTime = Param(parent='undefined', name='startTime', doc='A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
storageKey = Param(parent='undefined', name='storageKey', doc='Storage Key for your storage account used for uploading files.')
storageName = Param(parent='undefined', name='storageName', doc='Storage Name for your storage account used for uploading files.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
timestampCol = Param(parent='undefined', name='timestampCol', doc='Timestamp column name')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DictionaryExamples module

class synapse.ml.cognitive.DictionaryExamples.DictionaryExamples(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DictionaryExamples_968a795ecfa3_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryExamples_968a795ecfa3_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, textAndTranslation=None, textAndTranslationCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • fromLanguage (object) – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

  • handler (object) – Which strategy to use when handling requests

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • textAndTranslation (object) – A string specifying the translated text previously returned by the Dictionary lookup operation.

  • timeout (float) – number of seconds to wait before closing the connection

  • toLanguage (object) – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromLanguage = Param(parent='undefined', name='fromLanguage', doc='Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromLanguage()[source]
Returns

Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

Return type

fromLanguage

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]