synapse.ml.cognitive package

Submodules

synapse.ml.cognitive.AddDocuments module

class synapse.ml.cognitive.AddDocuments.AddDocuments(java_obj=None, actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_23d171c8e20c_error', handler=None, indexName=None, outputCol='AddDocuments_23d171c8e20c_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • indexName (str) –

  • outputCol (str) – The name of the output column

  • serviceName (str) –

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

actionCol = Param(parent='undefined', name='actionCol', doc=" You can combine actions, such as an upload and a delete, in the same batch.  upload: An upload action is similar to an 'upsert' where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case.  merge: Merge updates an existing document with the specified fields. If the document doesn't exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field 'tags' with value ['budget'] and you execute a merge with value ['economy', 'pool'] for 'tags', the final value of the 'tags' field will be ['economy', 'pool'].  It will not be ['budget', 'economy', 'pool'].  mergeOrUpload: This action behaves like merge if a document  with the given key already exists in the index.  If the document does not exist, it behaves like upload with a new document.  delete: Delete removes the specified document from the index.  Note that any field you specify in a delete operation,  other than the key field, will be ignored. If you want to   remove an individual field from a document, use merge   instead and simply set the field explicitly to null.     ")
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getActionCol()[source]
Returns

You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

Return type

actionCol

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIndexName()[source]
Returns

Return type

indexName

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getServiceName()[source]
Returns

Return type

serviceName

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
indexName = Param(parent='undefined', name='indexName', doc='')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

serviceName = Param(parent='undefined', name='serviceName', doc='')
setActionCol(value)[source]
Parameters

actionCol – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIndexName(value)[source]
Parameters

indexName

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_23d171c8e20c_error', handler=None, indexName=None, outputCol='AddDocuments_23d171c8e20c_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setServiceName(value)[source]
Parameters

serviceName

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeBusinessCards module

class synapse.ml.cognitive.AnalyzeBusinessCards.AnalyzeBusinessCards(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_78867d030dc8_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeBusinessCards_78867d030dc8_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_78867d030dc8_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeBusinessCards_78867d030dc8_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeCustomModel module

class synapse.ml.cognitive.AnalyzeCustomModel.AnalyzeCustomModel(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_355269a761b5_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelId=None, modelIdCol=None, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeCustomModel_355269a761b5_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • modelId (object) – Model identifier.

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelId()[source]
Returns

Model identifier.

Return type

modelId

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelId = Param(parent='undefined', name='modelId', doc='ServiceParam: Model identifier.')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelId(value)[source]
Parameters

modelId – Model identifier.

setModelIdCol(value)[source]
Parameters

modelId – Model identifier.

setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_355269a761b5_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelId=None, modelIdCol=None, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeCustomModel_355269a761b5_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeDocument module

class synapse.ml.cognitive.AnalyzeDocument.AnalyzeDocument(java_obj=None, apiVersion=None, apiVersionCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeDocument_668b4d7ee574_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeDocument_668b4d7ee574_output', pages=None, pagesCol=None, pollingDelay=300, prebuiltModelId=None, prebuiltModelIdCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • apiVersion (object) – version of the api

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • prebuiltModelId (object) – Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

  • stringIndexType (object) – Method used to compute string offset and length.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

apiVersion = Param(parent='undefined', name='apiVersion', doc='ServiceParam: version of the api')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getApiVersion()[source]
Returns

version of the api

Return type

apiVersion

getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getPrebuiltModelId()[source]
Returns

Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

Return type

prebuiltModelId

getStringIndexType()[source]
Returns

Method used to compute string offset and length.

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
prebuiltModelId = Param(parent='undefined', name='prebuiltModelId', doc='ServiceParam: Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId')
classmethod read()[source]

Returns an MLReader instance for this class.

setApiVersion(value)[source]
Parameters

apiVersion – version of the api

setApiVersionCol(value)[source]
Parameters

apiVersion – version of the api

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(apiVersion=None, apiVersionCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeDocument_668b4d7ee574_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeDocument_668b4d7ee574_output', pages=None, pagesCol=None, pollingDelay=300, prebuiltModelId=None, prebuiltModelIdCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setPrebuiltModelId(value)[source]
Parameters

prebuiltModelId – Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

setPrebuiltModelIdCol(value)[source]
Parameters

prebuiltModelId – Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

setStringIndexType(value)[source]
Parameters

stringIndexType – Method used to compute string offset and length.

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Method used to compute string offset and length.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Method used to compute string offset and length.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeIDDocuments module

class synapse.ml.cognitive.AnalyzeIDDocuments.AnalyzeIDDocuments(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_d43966693cb6_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeIDDocuments_d43966693cb6_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_d43966693cb6_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeIDDocuments_d43966693cb6_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeImage module

class synapse.ml.cognitive.AnalyzeImage.AnalyzeImage(java_obj=None, concurrency=1, concurrentTimeout=None, details=None, detailsCol=None, errorCol='AnalyzeImage_0f35ae1afc62_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='AnalyzeImage_0f35ae1afc62_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, visualFeatures=None, visualFeaturesCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • details (object) – what visual feature types to return

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language of the response (en if none given)

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

  • visualFeatures (object) – what visual feature types to return

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
details = Param(parent='undefined', name='details', doc='ServiceParam: what visual feature types to return')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDetails()[source]
Returns

what visual feature types to return

Return type

details

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language of the response (en if none given)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

getVisualFeatures()[source]
Returns

what visual feature types to return

Return type

visualFeatures

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language of the response (en if none given)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDetails(value)[source]
Parameters

details – what visual feature types to return

setDetailsCol(value)[source]
Parameters

details – what visual feature types to return

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – the language of the response (en if none given)

setLanguageCol(value)[source]
Parameters

language – the language of the response (en if none given)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, details=None, detailsCol=None, errorCol='AnalyzeImage_0f35ae1afc62_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='AnalyzeImage_0f35ae1afc62_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, visualFeatures=None, visualFeaturesCol=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

setVisualFeatures(value)[source]
Parameters

visualFeatures – what visual feature types to return

setVisualFeaturesCol(value)[source]
Parameters

visualFeatures – what visual feature types to return

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')
visualFeatures = Param(parent='undefined', name='visualFeatures', doc='ServiceParam: what visual feature types to return')

synapse.ml.cognitive.AnalyzeInvoices module

class synapse.ml.cognitive.AnalyzeInvoices.AnalyzeInvoices(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_43612dccb408_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeInvoices_43612dccb408_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_43612dccb408_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeInvoices_43612dccb408_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeLayout module

class synapse.ml.cognitive.AnalyzeLayout.AnalyzeLayout(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_42cda16b2446_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeLayout_42cda16b2446_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • language (object) – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

  • maxPollingRetries (int) – number of times to poll

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • readingOrder (object) – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getReadingOrder()[source]
Returns

Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

Return type

readingOrder

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
language = Param(parent='undefined', name='language', doc='ServiceParam: The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

readingOrder = Param(parent='undefined', name='readingOrder', doc="ServiceParam: Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either 'basic' or 'natural'. Will default to basic if not specified")
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLanguage(value)[source]
Parameters

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLanguageCol(value)[source]
Parameters

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_42cda16b2446_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeLayout_42cda16b2446_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setReadingOrder(value)[source]
Parameters

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setReadingOrderCol(value)[source]
Parameters

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeReceipts module

class synapse.ml.cognitive.AnalyzeReceipts.AnalyzeReceipts(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_cb89a48db0fa_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeReceipts_cb89a48db0fa_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_cb89a48db0fa_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeReceipts_cb89a48db0fa_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AzureSearchWriter module

synapse.ml.cognitive.AzureSearchWriter.streamToAzureSearch(df, **options)[source]
synapse.ml.cognitive.AzureSearchWriter.writeToAzureSearch(df, **options)[source]

synapse.ml.cognitive.BingImageSearch module

class synapse.ml.cognitive.BingImageSearch.BingImageSearch(java_obj=None, aspect=None, aspectCol=None, color=None, colorCol=None, concurrency=1, concurrentTimeout=None, count=None, countCol=None, errorCol='BingImageSearch_a5631379bdbb_error', freshness=None, freshnessCol=None, handler=None, height=None, heightCol=None, imageContent=None, imageContentCol=None, imageType=None, imageTypeCol=None, license=None, licenseCol=None, maxFileSize=None, maxFileSizeCol=None, maxHeight=None, maxHeightCol=None, maxWidth=None, maxWidthCol=None, minFileSize=None, minFileSizeCol=None, minHeight=None, minHeightCol=None, minWidth=None, minWidthCol=None, mkt=None, mktCol=None, offset=None, offsetCol=None, outputCol='BingImageSearch_a5631379bdbb_output', q=None, qCol=None, size=None, sizeCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url='https://api.bing.microsoft.com/v7.0/images/search', width=None, widthCol=None)[source]

Bases: synapse.ml.cognitive._BingImageSearch._BingImageSearch

static downloadFromUrls(pathCol, bytesCol, concurrency, timeout)[source]
static getUrlTransformer(imageCol, urlCol)[source]
setMarket(value)[source]
setMarketCol(value)[source]
setQuery(value)[source]
setQueryCol(value)[source]

synapse.ml.cognitive.BreakSentence module

class synapse.ml.cognitive.BreakSentence.BreakSentence(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='BreakSentence_eda0bcf1ca97_error', handler=None, language=None, languageCol=None, outputCol='BreakSentence_eda0bcf1ca97_output', script=None, scriptCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

  • outputCol (str) – The name of the output column

  • script (object) – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getScript()[source]
Returns

Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

Return type

script

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

script = Param(parent='undefined', name='script', doc='ServiceParam: Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLanguageCol(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='BreakSentence_eda0bcf1ca97_error', handler=None, language=None, languageCol=None, outputCol='BreakSentence_eda0bcf1ca97_output', script=None, scriptCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setScript(value)[source]
Parameters

script – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

setScriptCol(value)[source]
Parameters

script – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='ServiceParam: the API region to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.ConversationTranscription module

class synapse.ml.cognitive.ConversationTranscription.ConversationTranscription(java_obj=None, audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioDataCol (str) – Column holding audio data, must be either ByteArrays or Strings representing file URIs

  • endpointId (str) – endpoint for custom speech models

  • extraFfmpegArgs (list) – extra arguments to for ffmpeg output decoding

  • fileType (object) – The file type of the sound files, supported types: wav, ogg, mp3

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (str) – The name of the output column

  • participantsJson (object) – a json representation of a list of conversation participants (email, language, user)

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • recordAudioData (bool) – Whether to record audio data to a file location, for use only with m3u8 streams

  • recordedFileNameCol (str) – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

  • streamIntermediateResults (bool) – Whether or not to immediately return itermediate results, or group in a sequence

  • subscriptionKey (object) – the API key to use

  • url (str) – Url of the service

audioDataCol = Param(parent='undefined', name='audioDataCol', doc='Column holding audio data, must be either ByteArrays or Strings representing file URIs')
endpointId = Param(parent='undefined', name='endpointId', doc='endpoint for custom speech models')
extraFfmpegArgs = Param(parent='undefined', name='extraFfmpegArgs', doc='extra arguments to for ffmpeg output decoding')
fileType = Param(parent='undefined', name='fileType', doc='ServiceParam: The file type of the sound files, supported types: wav, ogg, mp3')
format = Param(parent='undefined', name='format', doc='ServiceParam:  Specifies the result format. Accepted values are simple and detailed. Default is simple.     ')
getAudioDataCol()[source]
Returns

Column holding audio data, must be either ByteArrays or Strings representing file URIs

Return type

audioDataCol

getEndpointId()[source]
Returns

endpoint for custom speech models

Return type

endpointId

getExtraFfmpegArgs()[source]
Returns

extra arguments to for ffmpeg output decoding

Return type

extraFfmpegArgs

getFileType()[source]
Returns

The file type of the sound files, supported types: wav, ogg, mp3

Return type

fileType

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

format

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getParticipantsJson()[source]
Returns

a json representation of a list of conversation participants (email, language, user)

Return type

participantsJson

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

profanity

getRecordAudioData()[source]
Returns

Whether to record audio data to a file location, for use only with m3u8 streams

Return type

recordAudioData

getRecordedFileNameCol()[source]
Returns

Column holding file names to write audio data to if ``recordAudioData’’ is set to true

Return type

recordedFileNameCol

getStreamIntermediateResults()[source]
Returns

Whether or not to immediately return itermediate results, or group in a sequence

Return type

streamIntermediateResults

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getUrl()[source]
Returns

Url of the service

Return type

url

language = Param(parent='undefined', name='language', doc='ServiceParam:  Identifies the spoken language that is being recognized.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
participantsJson = Param(parent='undefined', name='participantsJson', doc='ServiceParam: a json representation of a list of conversation participants (email, language, user)')
profanity = Param(parent='undefined', name='profanity', doc='ServiceParam:  Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

recordAudioData = Param(parent='undefined', name='recordAudioData', doc='Whether to record audio data to a file location, for use only with m3u8 streams')
recordedFileNameCol = Param(parent='undefined', name='recordedFileNameCol', doc="Column holding file names to write audio data to if ``recordAudioData'' is set to true")
setAudioDataCol(value)[source]
Parameters

audioDataCol – Column holding audio data, must be either ByteArrays or Strings representing file URIs

setEndpointId(value)[source]
Parameters

endpointId – endpoint for custom speech models

setExtraFfmpegArgs(value)[source]
Parameters

extraFfmpegArgs – extra arguments to for ffmpeg output decoding

setFileType(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFileTypeCol(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFormat(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setLanguage(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Set the (keyword only) parameters

setParticipantsJson(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setParticipantsJsonCol(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setProfanity(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setRecordAudioData(value)[source]
Parameters

recordAudioData – Whether to record audio data to a file location, for use only with m3u8 streams

setRecordedFileNameCol(value)[source]
Parameters

recordedFileNameCol – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

setStreamIntermediateResults(value)[source]
Parameters

streamIntermediateResults – Whether or not to immediately return itermediate results, or group in a sequence

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setUrl(value)[source]
Parameters

url – Url of the service

streamIntermediateResults = Param(parent='undefined', name='streamIntermediateResults', doc='Whether or not to immediately return itermediate results, or group in a sequence')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DescribeImage module

class synapse.ml.cognitive.DescribeImage.DescribeImage(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DescribeImage_cac8d6bc9fcc_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxCandidates=None, maxCandidatesCol=None, outputCol='DescribeImage_cac8d6bc9fcc_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – Language of image description

  • maxCandidates (object) – Maximum candidate descriptions to return

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language of image description

Return type

language

getMaxCandidates()[source]
Returns

Maximum candidate descriptions to return

Return type

maxCandidates

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
language = Param(parent='undefined', name='language', doc='ServiceParam: Language of image description')
maxCandidates = Param(parent='undefined', name='maxCandidates', doc='ServiceParam: Maximum candidate descriptions to return')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – Language of image description

setLanguageCol(value)[source]
Parameters

language – Language of image description

setLinkedService(value)[source]
setLocation(value)[source]
setMaxCandidates(value)[source]
Parameters

maxCandidates – Maximum candidate descriptions to return

setMaxCandidatesCol(value)[source]
Parameters

maxCandidates – Maximum candidate descriptions to return

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DescribeImage_cac8d6bc9fcc_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxCandidates=None, maxCandidatesCol=None, outputCol='DescribeImage_cac8d6bc9fcc_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.Detect module

class synapse.ml.cognitive.Detect.Detect(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='Detect_852ec33b4f2f_error', handler=None, outputCol='Detect_852ec33b4f2f_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='Detect_852ec33b4f2f_error', handler=None, outputCol='Detect_852ec33b4f2f_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='ServiceParam: the API region to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectAnomalies module

class synapse.ml.cognitive.DetectAnomalies.DetectAnomalies(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_bf669301e314_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_bf669301e314_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • imputeFixedValue (object) – Optional argument, fixed value to use when imputeMode is set to “fixed”

  • imputeMode (object) – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc='ServiceParam:  Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImputeFixedValue()[source]
Returns

Optional argument, fixed value to use when imputeMode is set to “fixed”

Return type

imputeFixedValue

getImputeMode()[source]
Returns

Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

Return type

imputeMode

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc='ServiceParam:  Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imputeFixedValue = Param(parent='undefined', name='imputeFixedValue', doc='ServiceParam:  Optional argument, fixed value to use when imputeMode is set to "fixed"     ')
imputeMode = Param(parent='undefined', name='imputeMode', doc='ServiceParam:  Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill     ')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc='ServiceParam:  Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc='ServiceParam:  Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc='ServiceParam:  Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc='ServiceParam:  Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImputeFixedValue(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeFixedValueCol(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeMode(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setImputeModeCol(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_bf669301e314_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_bf669301e314_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectFace module

class synapse.ml.cognitive.DetectFace.DetectFace(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DetectFace_ec61c3846cd1_error', handler=None, imageUrl=None, imageUrlCol=None, outputCol='DetectFace_ec61c3846cd1_output', returnFaceAttributes=None, returnFaceAttributesCol=None, returnFaceId=None, returnFaceIdCol=None, returnFaceLandmarks=None, returnFaceLandmarksCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageUrl (object) – the url of the image to use

  • outputCol (str) – The name of the output column

  • returnFaceAttributes (object) – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

  • returnFaceId (object) – Return faceIds of the detected faces or not. The default value is true

  • returnFaceLandmarks (object) – Return face landmarks of the detected faces or not. The default value is false.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getReturnFaceAttributes()[source]
Returns

Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

Return type

returnFaceAttributes

getReturnFaceId()[source]
Returns

Return faceIds of the detected faces or not. The default value is true

Return type

returnFaceId

getReturnFaceLandmarks()[source]
Returns

Return face landmarks of the detected faces or not. The default value is false.

Return type

returnFaceLandmarks

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

returnFaceAttributes = Param(parent='undefined', name='returnFaceAttributes', doc='ServiceParam: Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.')
returnFaceId = Param(parent='undefined', name='returnFaceId', doc='ServiceParam: Return faceIds of the detected faces or not. The default value is true')
returnFaceLandmarks = Param(parent='undefined', name='returnFaceLandmarks', doc='ServiceParam: Return face landmarks of the detected faces or not. The default value is false.')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DetectFace_ec61c3846cd1_error', handler=None, imageUrl=None, imageUrlCol=None, outputCol='DetectFace_ec61c3846cd1_output', returnFaceAttributes=None, returnFaceAttributesCol=None, returnFaceId=None, returnFaceIdCol=None, returnFaceLandmarks=None, returnFaceLandmarksCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setReturnFaceAttributes(value)[source]
Parameters

returnFaceAttributes – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceAttributesCol(value)[source]
Parameters

returnFaceAttributes – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceId(value)[source]
Parameters

returnFaceId – Return faceIds of the detected faces or not. The default value is true

setReturnFaceIdCol(value)[source]
Parameters

returnFaceId – Return faceIds of the detected faces or not. The default value is true

setReturnFaceLandmarks(value)[source]
Parameters

returnFaceLandmarks – Return face landmarks of the detected faces or not. The default value is false.

setReturnFaceLandmarksCol(value)[source]
Parameters

returnFaceLandmarks – Return face landmarks of the detected faces or not. The default value is false.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectLastAnomaly module

class synapse.ml.cognitive.DetectLastAnomaly.DetectLastAnomaly(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_49ae8cd4adb7_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_49ae8cd4adb7_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • imputeFixedValue (object) – Optional argument, fixed value to use when imputeMode is set to “fixed”

  • imputeMode (object) – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc='ServiceParam:  Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImputeFixedValue()[source]
Returns

Optional argument, fixed value to use when imputeMode is set to “fixed”

Return type

imputeFixedValue

getImputeMode()[source]
Returns

Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

Return type

imputeMode

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc='ServiceParam:  Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imputeFixedValue = Param(parent='undefined', name='imputeFixedValue', doc='ServiceParam:  Optional argument, fixed value to use when imputeMode is set to "fixed"     ')
imputeMode = Param(parent='undefined', name='imputeMode', doc='ServiceParam:  Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill     ')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc='ServiceParam:  Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc='ServiceParam:  Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc='ServiceParam:  Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc='ServiceParam:  Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImputeFixedValue(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeFixedValueCol(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeMode(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setImputeModeCol(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_49ae8cd4adb7_error', granularity=None, granularityCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_49ae8cd4adb7_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectMultivariateAnomaly module

class synapse.ml.cognitive.DetectMultivariateAnomaly.DetectMultivariateAnomaly(java_obj=None, backoffs=[100, 500, 1000], connectionString=None, containerName=None, endTime=None, endpoint=None, errorCol='DetectMultivariateAnomaly_df81e7f9071e_error', initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, modelId=None, outputCol='DetectMultivariateAnomaly_df81e7f9071e_output', pollingDelay=300, sasToken=None, startTime=None, storageKey=None, storageName=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timestampCol='timestamp', url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • connectionString (str) – Connection String for your storage account used for uploading files.

  • containerName (str) – Container that will be used to upload files to.

  • endTime (str) – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • endpoint (str) – End Point for your storage account used for uploading files.

  • errorCol (str) – column to hold http errors

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • inputCols (list) – The names of the input columns

  • intermediateSaveDir (str) – Directory name of which you want to save the intermediate data produced while training.

  • maxPollingRetries (int) – number of times to poll

  • modelId (str) – Format - uuid. Model identifier.

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • sasToken (str) – SAS Token for your storage account used for uploading files.

  • startTime (str) – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • storageKey (str) – Storage Key for your storage account used for uploading files.

  • storageName (str) – Storage Name for your storage account used for uploading files.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timestampCol (str) – Timestamp column name

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
cleanUpIntermediateData()[source]
connectionString = Param(parent='undefined', name='connectionString', doc='Connection String for your storage account used for uploading files.')
containerName = Param(parent='undefined', name='containerName', doc='Container that will be used to upload files to.')
endTime = Param(parent='undefined', name='endTime', doc='A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
endpoint = Param(parent='undefined', name='endpoint', doc='End Point for your storage account used for uploading files.')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConnectionString()[source]
Returns

Connection String for your storage account used for uploading files.

Return type

connectionString

getContainerName()[source]
Returns

Container that will be used to upload files to.

Return type

containerName

getEndTime()[source]
Returns

A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

endTime

getEndpoint()[source]
Returns

End Point for your storage account used for uploading files.

Return type

endpoint

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

getInputCols()[source]
Returns

The names of the input columns

Return type

inputCols

getIntermediateSaveDir()[source]
Returns

Directory name of which you want to save the intermediate data produced while training.

Return type

intermediateSaveDir

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelId()[source]
Returns

Format - uuid. Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSasToken()[source]
Returns

SAS Token for your storage account used for uploading files.

Return type

sasToken

getStartTime()[source]
Returns

A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

startTime

getStorageKey()[source]
Returns

Storage Key for your storage account used for uploading files.

Return type

storageKey

getStorageName()[source]
Returns

Storage Name for your storage account used for uploading files.

Return type

storageName

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimestampCol()[source]
Returns

Timestamp column name

Return type

timestampCol

getUrl()[source]
Returns

Url of the service

Return type

url

initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
intermediateSaveDir = Param(parent='undefined', name='intermediateSaveDir', doc='Directory name of which you want to save the intermediate data produced while training.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelId = Param(parent='undefined', name='modelId', doc='Format - uuid. Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

sasToken = Param(parent='undefined', name='sasToken', doc='SAS Token for your storage account used for uploading files.')
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConnectionString(value)[source]
Parameters

connectionString – Connection String for your storage account used for uploading files.

setContainerName(value)[source]
Parameters

containerName – Container that will be used to upload files to.

setEndTime(value)[source]
Parameters

endTime – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setEndpoint(value)[source]
Parameters

endpoint – End Point for your storage account used for uploading files.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setInputCols(value)[source]
Parameters

inputCols – The names of the input columns

setIntermediateSaveDir(value)[source]
Parameters

intermediateSaveDir – Directory name of which you want to save the intermediate data produced while training.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelId(value)[source]
Parameters

modelId – Format - uuid. Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], connectionString=None, containerName=None, endTime=None, endpoint=None, errorCol='DetectMultivariateAnomaly_df81e7f9071e_error', initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, modelId=None, outputCol='DetectMultivariateAnomaly_df81e7f9071e_output', pollingDelay=300, sasToken=None, startTime=None, storageKey=None, storageName=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timestampCol='timestamp', url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSasToken(value)[source]
Parameters

sasToken – SAS Token for your storage account used for uploading files.

setStartTime(value)[source]
Parameters

startTime – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setStorageKey(value)[source]
Parameters

storageKey – Storage Key for your storage account used for uploading files.

setStorageName(value)[source]
Parameters

storageName – Storage Name for your storage account used for uploading files.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimestampCol(value)[source]
Parameters

timestampCol – Timestamp column name

setUrl(value)[source]
Parameters

url – Url of the service

startTime = Param(parent='undefined', name='startTime', doc='A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
storageKey = Param(parent='undefined', name='storageKey', doc='Storage Key for your storage account used for uploading files.')
storageName = Param(parent='undefined', name='storageName', doc='Storage Name for your storage account used for uploading files.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timestampCol = Param(parent='undefined', name='timestampCol', doc='Timestamp column name')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DictionaryExamples module

class synapse.ml.cognitive.DictionaryExamples.DictionaryExamples(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DictionaryExamples_556e40a8dc6e_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryExamples_556e40a8dc6e_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, textAndTranslation=None, textAndTranslationCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • fromLanguage (object) – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

  • handler (object) – Which strategy to use when handling requests

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • textAndTranslation (object) – A string specifying the translated text previously returned by the Dictionary lookup operation.

  • timeout (float) – number of seconds to wait before closing the connection

  • toLanguage (object) – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromLanguage = Param(parent='undefined', name='fromLanguage', doc='ServiceParam: Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromLanguage()[source]
Returns

Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

Return type

fromLanguage

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getTextAndTranslation()[source]
Returns

A string specifying the translated text previously returned by the Dictionary lookup operation.

Return type

textAndTranslation

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToLanguage()[source]
Returns

Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

Return type

toLanguage

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromLanguage(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setFromLanguageCol(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DictionaryExamples_556e40a8dc6e_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryExamples_556e40a8dc6e_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, textAndTranslation=None, textAndTranslationCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setTextAndTranslation(value)[source]
Parameters

textAndTranslation – A string specifying the translated text previously returned by the Dictionary lookup operation.

setTextAndTranslationCol(value)[source]
Parameters

textAndTranslation – A string specifying the translated text previously returned by the Dictionary lookup operation.

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToLanguage(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setToLanguageCol(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='ServiceParam: the API region to use')
textAndTranslation = Param(parent='undefined', name='textAndTranslation', doc='ServiceParam:  A string specifying the translated text previously returned by the Dictionary lookup operation.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toLanguage = Param(parent='undefined', name='toLanguage', doc='ServiceParam: Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DictionaryLookup module

class synapse.ml.cognitive.DictionaryLookup.DictionaryLookup(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DictionaryLookup_ca9d06ff4409_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryLookup_ca9d06ff4409_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • fromLanguage (object) – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

  • handler (object) – Which strategy to use when handling requests

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • toLanguage (object) – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromLanguage = Param(parent='undefined', name='fromLanguage', doc='ServiceParam: Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromLanguage()[source]
Returns

Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

Return type

fromLanguage

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToLanguage()[source]
Returns

Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

Return type

toLanguage

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromLanguage(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setFromLanguageCol(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DictionaryLookup_ca9d06ff4409_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryLookup_ca9d06ff4409_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToLanguage(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setToLanguageCol(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='ServiceParam: the API region to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toLanguage = Param(parent='undefined', name='toLanguage', doc='ServiceParam: Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DocumentTranslator module

class synapse.ml.cognitive.DocumentTranslator.DocumentTranslator(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='DocumentTranslator_05248ebcc0a0_error', filterPrefix=None, filterPrefixCol=None, filterSuffix=None, filterSuffixCol=None, initialPollingDelay=300, maxPollingRetries=1000, outputCol='DocumentTranslator_05248ebcc0a0_output', pollingDelay=300, serviceName=None, sourceLanguage=None, sourceLanguageCol=None, sourceStorageSource=None, sourceStorageSourceCol=None, sourceUrl=None, sourceUrlCol=None, storageType=None, storageTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, targets=None, targetsCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • filterPrefix (object) – A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

  • filterSuffix (object) – A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • serviceName (str) –

  • sourceLanguage (object) – Language code. If none is specified, we will perform auto detect on the document.

  • sourceStorageSource (object) – Storage source of source input.

  • sourceUrl (object) – Location of the folder / container or single file with your documents.

  • storageType (object) – Storage type of the input documents source string. Required for single document translation only.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • targets (object) – Destination for the finished translated documents.

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
filterPrefix = Param(parent='undefined', name='filterPrefix', doc='ServiceParam: A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.')
filterSuffix = Param(parent='undefined', name='filterSuffix', doc='ServiceParam: A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFilterPrefix()[source]
Returns

A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

Return type

filterPrefix

getFilterSuffix()[source]
Returns

A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

Return type

filterSuffix

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getServiceName()[source]
Returns

Return type

serviceName

getSourceLanguage()[source]
Returns

Language code. If none is specified, we will perform auto detect on the document.

Return type

sourceLanguage

getSourceStorageSource()[source]
Returns

Storage source of source input.

Return type

sourceStorageSource

getSourceUrl()[source]
Returns

Location of the folder / container or single file with your documents.

Return type

sourceUrl

getStorageType()[source]
Returns

Storage type of the input documents source string. Required for single document translation only.

Return type

storageType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTargets()[source]
Returns

Destination for the finished translated documents.

Return type

targets

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

serviceName = Param(parent='undefined', name='serviceName', doc='')
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFilterPrefix(value)[source]
Parameters

filterPrefix – A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

setFilterPrefixCol(value)[source]
Parameters

filterPrefix – A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

setFilterSuffix(value)[source]
Parameters

filterSuffix – A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

setFilterSuffixCol(value)[source]
Parameters

filterSuffix – A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='DocumentTranslator_05248ebcc0a0_error', filterPrefix=None, filterPrefixCol=None, filterSuffix=None, filterSuffixCol=None, initialPollingDelay=300, maxPollingRetries=1000, outputCol='DocumentTranslator_05248ebcc0a0_output', pollingDelay=300, serviceName=None, sourceLanguage=None, sourceLanguageCol=None, sourceStorageSource=None, sourceStorageSourceCol=None, sourceUrl=None, sourceUrlCol=None, storageType=None, storageTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, targets=None, targetsCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setServiceName(value)[source]
Parameters

serviceName

setSourceLanguage(value)[source]
Parameters

sourceLanguage – Language code. If none is specified, we will perform auto detect on the document.

setSourceLanguageCol(value)[source]
Parameters

sourceLanguage – Language code. If none is specified, we will perform auto detect on the document.

setSourceStorageSource(value)[source]
Parameters

sourceStorageSource – Storage source of source input.

setSourceStorageSourceCol(value)[source]
Parameters

sourceStorageSource – Storage source of source input.

setSourceUrl(value)[source]
Parameters

sourceUrl – Location of the folder / container or single file with your documents.

setSourceUrlCol(value)[source]
Parameters

sourceUrl – Location of the folder / container or single file with your documents.

setStorageType(value)[source]
Parameters

storageType – Storage type of the input documents source string. Required for single document translation only.

setStorageTypeCol(value)[source]
Parameters

storageType – Storage type of the input documents source string. Required for single document translation only.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTargets(value)[source]
Parameters

targets – Destination for the finished translated documents.

setTargetsCol(value)[source]
Parameters

targets – Destination for the finished translated documents.

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

sourceLanguage = Param(parent='undefined', name='sourceLanguage', doc='ServiceParam: Language code. If none is specified, we will perform auto detect on the document.')
sourceStorageSource = Param(parent='undefined', name='sourceStorageSource', doc='ServiceParam: Storage source of source input.')
sourceUrl = Param(parent='undefined', name='sourceUrl', doc='ServiceParam: Location of the folder / container or single file with your documents.')
storageType = Param(parent='undefined', name='storageType', doc='ServiceParam: Storage type of the input documents source string. Required for single document translation only.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
targets = Param(parent='undefined', name='targets', doc='ServiceParam: Destination for the finished translated documents.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.EntityDetector module

class synapse.ml.cognitive.EntityDetector.EntityDetector(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='EntityDetector_7958cc13daf9_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='EntityDetector_7958cc13daf9_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='EntityDetector_7958cc13daf9_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='EntityDetector_7958cc13daf9_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.EntityDetectorSDK module

class synapse.ml.cognitive.EntityDetectorSDK.EntityDetectorSDK(java_obj=None, batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • includeStatistics (object) – includeStatistics option

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – modelVersion option

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeStatistics()[source]
Returns

includeStatistics option

Return type

includeStatistics

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

modelVersion option

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeStatistics = Param(parent='undefined', name='includeStatistics', doc='ServiceParam: includeStatistics option')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: modelVersion option')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeStatistics(value)[source]
Parameters

includeStatistics – includeStatistics option

setIncludeStatisticsCol(value)[source]
Parameters

includeStatistics – includeStatistics option

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – modelVersion option

setModelVersionCol(value)[source]
Parameters

modelVersion – modelVersion option

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.EntityDetectorV2 module

class synapse.ml.cognitive.EntityDetectorV2.EntityDetectorV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='EntityDetectorV2_87ab05d95641_error', handler=None, language=None, languageCol=None, outputCol='EntityDetectorV2_87ab05d95641_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='EntityDetectorV2_87ab05d95641_error', handler=None, language=None, languageCol=None, outputCol='EntityDetectorV2_87ab05d95641_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.FindSimilarFace module

class synapse.ml.cognitive.FindSimilarFace.FindSimilarFace(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='FindSimilarFace_14c9314727f8_error', faceId=None, faceIdCol=None, faceIds=None, faceIdsCol=None, faceListId=None, faceListIdCol=None, handler=None, largeFaceListId=None, largeFaceListIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, mode=None, modeCol=None, outputCol='FindSimilarFace_14c9314727f8_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • faceId (object) – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

  • faceIds (object) – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • faceListId (object) – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • handler (object) – Which strategy to use when handling requests

  • largeFaceListId (object) – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

  • mode (object) – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceId = Param(parent='undefined', name='faceId', doc='ServiceParam: faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.')
faceIds = Param(parent='undefined', name='faceIds', doc='ServiceParam:  An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.')
faceListId = Param(parent='undefined', name='faceListId', doc='ServiceParam:  An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceId()[source]
Returns

faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

Return type

faceId

getFaceIds()[source]
Returns

An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

faceIds

getFaceListId()[source]
Returns

An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

faceListId

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLargeFaceListId()[source]
Returns

An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

largeFaceListId

getMaxNumOfCandidatesReturned()[source]
Returns

Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

Return type

maxNumOfCandidatesReturned

getMode()[source]
Returns

Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

Return type

mode

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
largeFaceListId = Param(parent='undefined', name='largeFaceListId', doc='ServiceParam:  An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.')
maxNumOfCandidatesReturned = Param(parent='undefined', name='maxNumOfCandidatesReturned', doc='ServiceParam:  Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.')
mode = Param(parent='undefined', name='mode', doc="ServiceParam:  Optional parameter. Similar face searching mode. It can be 'matchPerson' or 'matchFace'. It defaults to 'matchPerson'.")
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceId(value)[source]
Parameters

faceId – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

setFaceIdCol(value)[source]
Parameters

faceId – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

setFaceIds(value)[source]
Parameters

faceIds – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceIdsCol(value)[source]
Parameters

faceIds – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceListId(value)[source]
Parameters

faceListId – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceListIdCol(value)[source]
Parameters

faceListId – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLargeFaceListId(value)[source]
Parameters

largeFaceListId – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setLargeFaceListIdCol(value)[source]
Parameters

largeFaceListId – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxNumOfCandidatesReturned(value)[source]
Parameters

maxNumOfCandidatesReturned – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

setMaxNumOfCandidatesReturnedCol(value)[source]
Parameters

maxNumOfCandidatesReturned – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

setMode(value)[source]
Parameters

mode – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

setModeCol(value)[source]
Parameters

mode – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='FindSimilarFace_14c9314727f8_error', faceId=None, faceIdCol=None, faceIds=None, faceIdsCol=None, faceListId=None, faceListIdCol=None, handler=None, largeFaceListId=None, largeFaceListIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, mode=None, modeCol=None, outputCol='FindSimilarFace_14c9314727f8_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.FitMultivariateAnomaly module

class synapse.ml.cognitive.FitMultivariateAnomaly.FitMultivariateAnomaly(java_obj=None, alignMode=None, backoffs=[100, 500, 1000], connectionString=None, containerName=None, diagnosticsInfo=None, displayName=None, endTime=None, endpoint=None, errorCol='FitMultivariateAnomaly_a9311a34e5b0_error', fillNAMethod=None, initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, outputCol='FitMultivariateAnomaly_a9311a34e5b0_output', paddingValue=None, pollingDelay=300, sasToken=None, slidingWindow=None, startTime=None, storageKey=None, storageName=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timestampCol='timestamp', url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • alignMode (str) – An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}

  • backoffs (list) – array of backoffs to use in the handler

  • connectionString (str) – Connection String for your storage account used for uploading files.

  • containerName (str) – Container that will be used to upload files to.

  • diagnosticsInfo (object) – diagnosticsInfo for training a multivariate anomaly detection model

  • displayName (str) – optional field, name of the model

  • endTime (str) – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • endpoint (str) – End Point for your storage account used for uploading files.

  • errorCol (str) – column to hold http errors

  • fillNAMethod (str) – An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • inputCols (list) – The names of the input columns

  • intermediateSaveDir (str) – Directory name of which you want to save the intermediate data produced while training.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • paddingValue (int) – optional field, is only useful if FillNAMethod is set to Fixed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • sasToken (str) – SAS Token for your storage account used for uploading files.

  • slidingWindow (int) – An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.

  • startTime (str) – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

  • storageKey (str) – Storage Key for your storage account used for uploading files.

  • storageName (str) – Storage Name for your storage account used for uploading files.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timestampCol (str) – Timestamp column name

  • url (str) – Url of the service

alignMode = Param(parent='undefined', name='alignMode', doc='An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
cleanUpIntermediateData()[source]
connectionString = Param(parent='undefined', name='connectionString', doc='Connection String for your storage account used for uploading files.')
containerName = Param(parent='undefined', name='containerName', doc='Container that will be used to upload files to.')
diagnosticsInfo = Param(parent='undefined', name='diagnosticsInfo', doc='diagnosticsInfo for training a multivariate anomaly detection model')
displayName = Param(parent='undefined', name='displayName', doc='optional field, name of the model')
endTime = Param(parent='undefined', name='endTime', doc='A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
endpoint = Param(parent='undefined', name='endpoint', doc='End Point for your storage account used for uploading files.')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fillNAMethod = Param(parent='undefined', name='fillNAMethod', doc='An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}')
getAlignMode()[source]
Returns

An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}

Return type

alignMode

getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConnectionString()[source]
Returns

Connection String for your storage account used for uploading files.

Return type

connectionString

getContainerName()[source]
Returns

Container that will be used to upload files to.

Return type

containerName

getDiagnosticsInfo()[source]
Returns

diagnosticsInfo for training a multivariate anomaly detection model

Return type

diagnosticsInfo

getDisplayName()[source]
Returns

optional field, name of the model

Return type

displayName

getEndTime()[source]
Returns

A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

endTime

getEndpoint()[source]
Returns

End Point for your storage account used for uploading files.

Return type

endpoint

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFillNAMethod()[source]
Returns

An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}

Return type

fillNAMethod

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

getInputCols()[source]
Returns

The names of the input columns

Return type

inputCols

getIntermediateSaveDir()[source]
Returns

Directory name of which you want to save the intermediate data produced while training.

Return type

intermediateSaveDir

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPaddingValue()[source]
Returns

optional field, is only useful if FillNAMethod is set to Fixed.

Return type

paddingValue

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSasToken()[source]
Returns

SAS Token for your storage account used for uploading files.

Return type

sasToken

getSlidingWindow()[source]
Returns

An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.

Return type

slidingWindow

getStartTime()[source]
Returns

A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

Return type

startTime

getStorageKey()[source]
Returns

Storage Key for your storage account used for uploading files.

Return type

storageKey

getStorageName()[source]
Returns

Storage Name for your storage account used for uploading files.

Return type

storageName

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimestampCol()[source]
Returns

Timestamp column name

Return type

timestampCol

getUrl()[source]
Returns

Url of the service

Return type

url

initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
inputCols = Param(parent='undefined', name='inputCols', doc='The names of the input columns')
intermediateSaveDir = Param(parent='undefined', name='intermediateSaveDir', doc='Directory name of which you want to save the intermediate data produced while training.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
paddingValue = Param(parent='undefined', name='paddingValue', doc='optional field, is only useful if FillNAMethod is set to Fixed.')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

sasToken = Param(parent='undefined', name='sasToken', doc='SAS Token for your storage account used for uploading files.')
setAlignMode(value)[source]
Parameters

alignMode – An optional field, indicates how we align different variables into the same time-range which is required by the model.{Inner, Outer}

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConnectionString(value)[source]
Parameters

connectionString – Connection String for your storage account used for uploading files.

setContainerName(value)[source]
Parameters

containerName – Container that will be used to upload files to.

setDiagnosticsInfo(value)[source]
Parameters

diagnosticsInfo – diagnosticsInfo for training a multivariate anomaly detection model

setDisplayName(value)[source]
Parameters

displayName – optional field, name of the model

setEndTime(value)[source]
Parameters

endTime – A required field, end time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setEndpoint(value)[source]
Parameters

endpoint – End Point for your storage account used for uploading files.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFillNAMethod(value)[source]
Parameters

fillNAMethod – An optional field, indicates how missed values will be filled with. Can not be set to NotFill, when alignMode is Outer.{Previous, Subsequent, Linear, Zero, Fixed}

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setInputCols(value)[source]
Parameters

inputCols – The names of the input columns

setIntermediateSaveDir(value)[source]
Parameters

intermediateSaveDir – Directory name of which you want to save the intermediate data produced while training.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPaddingValue(value)[source]
Parameters

paddingValue – optional field, is only useful if FillNAMethod is set to Fixed.

setParams(alignMode=None, backoffs=[100, 500, 1000], connectionString=None, containerName=None, diagnosticsInfo=None, displayName=None, endTime=None, endpoint=None, errorCol='FitMultivariateAnomaly_a9311a34e5b0_error', fillNAMethod=None, initialPollingDelay=300, inputCols=None, intermediateSaveDir=None, maxPollingRetries=1000, outputCol='FitMultivariateAnomaly_a9311a34e5b0_output', paddingValue=None, pollingDelay=300, sasToken=None, slidingWindow=None, startTime=None, storageKey=None, storageName=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timestampCol='timestamp', url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSasToken(value)[source]
Parameters

sasToken – SAS Token for your storage account used for uploading files.

setSlidingWindow(value)[source]
Parameters

slidingWindow – An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.

setStartTime(value)[source]
Parameters

startTime – A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.

setStorageKey(value)[source]
Parameters

storageKey – Storage Key for your storage account used for uploading files.

setStorageName(value)[source]
Parameters

storageName – Storage Name for your storage account used for uploading files.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimestampCol(value)[source]
Parameters

timestampCol – Timestamp column name

setUrl(value)[source]
Parameters

url – Url of the service

slidingWindow = Param(parent='undefined', name='slidingWindow', doc='An optional field, indicates how many history points will be used to determine the anomaly score of one subsequent point.')
startTime = Param(parent='undefined', name='startTime', doc='A required field, start time of data to be used for detection/generating multivariate anomaly detection model, should be date-time.')
storageKey = Param(parent='undefined', name='storageKey', doc='Storage Key for your storage account used for uploading files.')
storageName = Param(parent='undefined', name='storageName', doc='Storage Name for your storage account used for uploading files.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timestampCol = Param(parent='undefined', name='timestampCol', doc='Timestamp column name')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.FormOntologyLearner module

class synapse.ml.cognitive.FormOntologyLearner.FormOntologyLearner(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.cognitive.FormOntologyTransformer module

class synapse.ml.cognitive.FormOntologyTransformer.FormOntologyTransformer(java_obj=None, inputCol=None, ontology=None, outputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • inputCol (str) – The name of the input column

  • ontology (object) – The ontology to cast values to

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOntology()[source]
Returns

The ontology to cast values to

Return type

ontology

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
ontology = Param(parent='undefined', name='ontology', doc='The ontology to cast values to')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOntology(value)[source]
Parameters

ontology – The ontology to cast values to

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(inputCol=None, ontology=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.cognitive.GenerateThumbnails module

class synapse.ml.cognitive.GenerateThumbnails.GenerateThumbnails(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='GenerateThumbnails_d864601fc100_error', handler=None, height=None, heightCol=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, outputCol='GenerateThumbnails_d864601fc100_output', smartCropping=None, smartCroppingCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, width=None, widthCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • height (object) – the desired height of the image

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • outputCol (str) – The name of the output column

  • smartCropping (object) – whether to intelligently crop the image

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

  • width (object) – the desired width of the image

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getHeight()[source]
Returns

the desired height of the image

Return type

height

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSmartCropping()[source]
Returns

whether to intelligently crop the image

Return type

smartCropping

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

getWidth()[source]
Returns

the desired width of the image

Return type

width

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
height = Param(parent='undefined', name='height', doc='ServiceParam: the desired height of the image')
imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setHeight(value)[source]
Parameters

height – the desired height of the image

setHeightCol(value)[source]
Parameters

height – the desired height of the image

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='GenerateThumbnails_d864601fc100_error', handler=None, height=None, heightCol=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, outputCol='GenerateThumbnails_d864601fc100_output', smartCropping=None, smartCroppingCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, width=None, widthCol=None)[source]

Set the (keyword only) parameters

setSmartCropping(value)[source]
Parameters

smartCropping – whether to intelligently crop the image

setSmartCroppingCol(value)[source]
Parameters

smartCropping – whether to intelligently crop the image

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

setWidth(value)[source]
Parameters

width – the desired width of the image

setWidthCol(value)[source]
Parameters

width – the desired width of the image

smartCropping = Param(parent='undefined', name='smartCropping', doc='ServiceParam: whether to intelligently crop the image')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')
width = Param(parent='undefined', name='width', doc='ServiceParam: the desired width of the image')

synapse.ml.cognitive.GetCustomModel module

class synapse.ml.cognitive.GetCustomModel.GetCustomModel(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='GetCustomModel_ba92f2acdbbb_error', handler=None, includeKeys=None, includeKeysCol=None, modelId=None, modelIdCol=None, outputCol='GetCustomModel_ba92f2acdbbb_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • includeKeys (object) – Include list of extracted keys in model information.

  • modelId (object) – Model identifier.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIncludeKeys()[source]
Returns

Include list of extracted keys in model information.

Return type

includeKeys

static getJavaPackage()[source]

Returns package name String.

getModelId()[source]
Returns

Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
includeKeys = Param(parent='undefined', name='includeKeys', doc='ServiceParam: Include list of extracted keys in model information.')
modelId = Param(parent='undefined', name='modelId', doc='ServiceParam: Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIncludeKeys(value)[source]
Parameters

includeKeys – Include list of extracted keys in model information.

setIncludeKeysCol(value)[source]
Parameters

includeKeys – Include list of extracted keys in model information.

setLinkedService(value)[source]
setLocation(value)[source]
setModelId(value)[source]
Parameters

modelId – Model identifier.

setModelIdCol(value)[source]
Parameters

modelId – Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='GetCustomModel_ba92f2acdbbb_error', handler=None, includeKeys=None, includeKeysCol=None, modelId=None, modelIdCol=None, outputCol='GetCustomModel_ba92f2acdbbb_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.GroupFaces module

class synapse.ml.cognitive.GroupFaces.GroupFaces(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='GroupFaces_aaf3c4aaf003_error', faceIds=None, faceIdsCol=None, handler=None, outputCol='GroupFaces_aaf3c4aaf003_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • faceIds (object) – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

  • handler (object) – Which strategy to use when handling requests

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceIds = Param(parent='undefined', name='faceIds', doc='ServiceParam: Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceIds()[source]
Returns

Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

Return type

faceIds

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceIds(value)[source]
Parameters

faceIds – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

setFaceIdsCol(value)[source]
Parameters

faceIds – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='GroupFaces_aaf3c4aaf003_error', faceIds=None, faceIdsCol=None, handler=None, outputCol='GroupFaces_aaf3c4aaf003_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.HealthcareSDK module

class synapse.ml.cognitive.HealthcareSDK.HealthcareSDK(java_obj=None, batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • includeStatistics (object) – includeStatistics option

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – modelVersion option

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeStatistics()[source]
Returns

includeStatistics option

Return type

includeStatistics

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

modelVersion option

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeStatistics = Param(parent='undefined', name='includeStatistics', doc='ServiceParam: includeStatistics option')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: modelVersion option')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeStatistics(value)[source]
Parameters

includeStatistics – includeStatistics option

setIncludeStatisticsCol(value)[source]
Parameters

includeStatistics – includeStatistics option

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – modelVersion option

setModelVersionCol(value)[source]
Parameters

modelVersion – modelVersion option

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.IdentifyFaces module

class synapse.ml.cognitive.IdentifyFaces.IdentifyFaces(java_obj=None, concurrency=1, concurrentTimeout=None, confidenceThreshold=None, confidenceThresholdCol=None, errorCol='IdentifyFaces_b41be4f19b96_error', faceIds=None, faceIdsCol=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, outputCol='IdentifyFaces_b41be4f19b96_output', personGroupId=None, personGroupIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • confidenceThreshold (object) – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

  • errorCol (str) – column to hold http errors

  • faceIds (object) – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

  • handler (object) – Which strategy to use when handling requests

  • largePersonGroupId (object) – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

  • outputCol (str) – The name of the output column

  • personGroupId (object) – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
confidenceThreshold = Param(parent='undefined', name='confidenceThreshold', doc='ServiceParam: Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceIds = Param(parent='undefined', name='faceIds', doc='ServiceParam: Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10]. ')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getConfidenceThreshold()[source]
Returns

Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

Return type

confidenceThreshold

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceIds()[source]
Returns

Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

Return type

faceIds

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLargePersonGroupId()[source]
Returns

largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

largePersonGroupId

getMaxNumOfCandidatesReturned()[source]
Returns

The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

Return type

maxNumOfCandidatesReturned

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPersonGroupId()[source]
Returns

personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

personGroupId

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
largePersonGroupId = Param(parent='undefined', name='largePersonGroupId', doc='ServiceParam: largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
maxNumOfCandidatesReturned = Param(parent='undefined', name='maxNumOfCandidatesReturned', doc='ServiceParam: The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
personGroupId = Param(parent='undefined', name='personGroupId', doc='ServiceParam: personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setConfidenceThreshold(value)[source]
Parameters

confidenceThreshold – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

setConfidenceThresholdCol(value)[source]
Parameters

confidenceThreshold – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceIds(value)[source]
Parameters

faceIds – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

setFaceIdsCol(value)[source]
Parameters

faceIds – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLargePersonGroupId(value)[source]
Parameters

largePersonGroupId – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLargePersonGroupIdCol(value)[source]
Parameters

largePersonGroupId – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxNumOfCandidatesReturned(value)[source]
Parameters

maxNumOfCandidatesReturned – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

setMaxNumOfCandidatesReturnedCol(value)[source]
Parameters

maxNumOfCandidatesReturned – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, confidenceThreshold=None, confidenceThresholdCol=None, errorCol='IdentifyFaces_b41be4f19b96_error', faceIds=None, faceIdsCol=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, outputCol='IdentifyFaces_b41be4f19b96_output', personGroupId=None, personGroupIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPersonGroupId(value)[source]
Parameters

personGroupId – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonGroupIdCol(value)[source]
Parameters

personGroupId – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.KeyPhraseExtractor module

class synapse.ml.cognitive.KeyPhraseExtractor.KeyPhraseExtractor(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractor_cc81a1eef0f1_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='KeyPhraseExtractor_cc81a1eef0f1_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractor_cc81a1eef0f1_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='KeyPhraseExtractor_cc81a1eef0f1_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.KeyPhraseExtractorSDK module

class synapse.ml.cognitive.KeyPhraseExtractorSDK.KeyPhraseExtractorSDK(java_obj=None, batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • includeStatistics (object) – includeStatistics option

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – modelVersion option

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeStatistics()[source]
Returns

includeStatistics option

Return type

includeStatistics

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

modelVersion option

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeStatistics = Param(parent='undefined', name='includeStatistics', doc='ServiceParam: includeStatistics option')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: modelVersion option')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeStatistics(value)[source]
Parameters

includeStatistics – includeStatistics option

setIncludeStatisticsCol(value)[source]
Parameters

includeStatistics – includeStatistics option

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – modelVersion option

setModelVersionCol(value)[source]
Parameters

modelVersion – modelVersion option

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.KeyPhraseExtractorV2 module

class synapse.ml.cognitive.KeyPhraseExtractorV2.KeyPhraseExtractorV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractorV2_f07e4dfc4c82_error', handler=None, language=None, languageCol=None, outputCol='KeyPhraseExtractorV2_f07e4dfc4c82_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractorV2_f07e4dfc4c82_error', handler=None, language=None, languageCol=None, outputCol='KeyPhraseExtractorV2_f07e4dfc4c82_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.LanguageDetector module

class synapse.ml.cognitive.LanguageDetector.LanguageDetector(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='LanguageDetector_e5c9240a65e9_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='LanguageDetector_e5c9240a65e9_output', showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='LanguageDetector_e5c9240a65e9_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='LanguageDetector_e5c9240a65e9_output', showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: if set to true, response will contain input and document level statistics.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.LanguageDetectorSDK module

class synapse.ml.cognitive.LanguageDetectorSDK.LanguageDetectorSDK(java_obj=None, batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • includeStatistics (object) – includeStatistics option

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – modelVersion option

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeStatistics()[source]
Returns

includeStatistics option

Return type

includeStatistics

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

modelVersion option

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeStatistics = Param(parent='undefined', name='includeStatistics', doc='ServiceParam: includeStatistics option')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: modelVersion option')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeStatistics(value)[source]
Parameters

includeStatistics – includeStatistics option

setIncludeStatisticsCol(value)[source]
Parameters

includeStatistics – includeStatistics option

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – modelVersion option

setModelVersionCol(value)[source]
Parameters

modelVersion – modelVersion option

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.LanguageDetectorV2 module

class synapse.ml.cognitive.LanguageDetectorV2.LanguageDetectorV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='LanguageDetectorV2_a03d227bff97_error', handler=None, language=None, languageCol=None, outputCol='LanguageDetectorV2_a03d227bff97_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='LanguageDetectorV2_a03d227bff97_error', handler=None, language=None, languageCol=None, outputCol='LanguageDetectorV2_a03d227bff97_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.ListCustomModels module

class synapse.ml.cognitive.ListCustomModels.ListCustomModels(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='ListCustomModels_6e1ef12926b2_error', handler=None, op=None, opCol=None, outputCol='ListCustomModels_6e1ef12926b2_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • op (object) – Specify whether to return summary or full list of models.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOp()[source]
Returns

Specify whether to return summary or full list of models.

Return type

op

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
op = Param(parent='undefined', name='op', doc='ServiceParam: Specify whether to return summary or full list of models.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOp(value)[source]
Parameters

op – Specify whether to return summary or full list of models.

setOpCol(value)[source]
Parameters

op – Specify whether to return summary or full list of models.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='ListCustomModels_6e1ef12926b2_error', handler=None, op=None, opCol=None, outputCol='ListCustomModels_6e1ef12926b2_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.NER module

class synapse.ml.cognitive.NER.NER(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='NER_ed9225701f0e_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='NER_ed9225701f0e_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='NER_ed9225701f0e_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='NER_ed9225701f0e_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.NERSDK module

class synapse.ml.cognitive.NERSDK.NERSDK(java_obj=None, batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • includeStatistics (object) – includeStatistics option

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – modelVersion option

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeStatistics()[source]
Returns

includeStatistics option

Return type

includeStatistics

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

modelVersion option

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeStatistics = Param(parent='undefined', name='includeStatistics', doc='ServiceParam: includeStatistics option')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: modelVersion option')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeStatistics(value)[source]
Parameters

includeStatistics – includeStatistics option

setIncludeStatisticsCol(value)[source]
Parameters

includeStatistics – includeStatistics option

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – modelVersion option

setModelVersionCol(value)[source]
Parameters

modelVersion – modelVersion option

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.NERV2 module

class synapse.ml.cognitive.NERV2.NERV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='NERV2_8929facac6a6_error', handler=None, language=None, languageCol=None, outputCol='NERV2_8929facac6a6_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='NERV2_8929facac6a6_error', handler=None, language=None, languageCol=None, outputCol='NERV2_8929facac6a6_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.OCR module

class synapse.ml.cognitive.OCR.OCR(java_obj=None, concurrency=1, concurrentTimeout=None, detectOrientation=None, detectOrientationCol=None, errorCol='OCR_7dcd8ae74e88_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='OCR_7dcd8ae74e88_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • detectOrientation (object) – whether to detect image orientation prior to processing

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language to use

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
detectOrientation = Param(parent='undefined', name='detectOrientation', doc='ServiceParam: whether to detect image orientation prior to processing')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDetectOrientation()[source]
Returns

whether to detect image orientation prior to processing

Return type

detectOrientation

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language to use

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language to use')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDetectOrientation(value)[source]
Parameters

detectOrientation – whether to detect image orientation prior to processing

setDetectOrientationCol(value)[source]
Parameters

detectOrientation – whether to detect image orientation prior to processing

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – the language to use

setLanguageCol(value)[source]
Parameters

language – the language to use

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, detectOrientation=None, detectOrientationCol=None, errorCol='OCR_7dcd8ae74e88_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='OCR_7dcd8ae74e88_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.OpenAICompletion module

class synapse.ml.cognitive.OpenAICompletion.OpenAICompletion(java_obj=None, apiVersion=None, apiVersionCol=None, batchIndexPrompt=None, batchIndexPromptCol=None, batchPrompt=None, batchPromptCol=None, bestOf=None, bestOfCol=None, cacheLevel=None, cacheLevelCol=None, concurrency=1, concurrentTimeout=None, deploymentName=None, deploymentNameCol=None, echo=None, echoCol=None, errorCol='OpenAPICompletion_9fade8566b4a_error', frequencyPenalty=None, frequencyPenaltyCol=None, handler=None, indexPrompt=None, indexPromptCol=None, logProbs=None, logProbsCol=None, maxTokens=None, maxTokensCol=None, model=None, modelCol=None, n=None, nCol=None, outputCol='OpenAPICompletion_9fade8566b4a_output', presencePenalty=None, presencePenaltyCol=None, prompt=None, promptCol=None, stop=None, stopCol=None, subscriptionKey=None, subscriptionKeyCol=None, temperature=None, temperatureCol=None, timeout=60.0, topP=None, topPCol=None, url=None, user=None, userCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • apiVersion (object) – version of the api

  • batchIndexPrompt (object) – Sequence of index sequences to complete

  • batchPrompt (object) – Sequence of prompts to complete

  • bestOf (object) – How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

  • cacheLevel (object) – can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • deploymentName (object) – The name of the deployment

  • echo (object) – Echo back the prompt in addition to the completion

  • errorCol (str) – column to hold http errors

  • frequencyPenalty (object) – How much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood to talk about new topics.

  • handler (object) – Which strategy to use when handling requests

  • indexPrompt (object) – Sequence of indexes to complete

  • logProbs (object) – Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

  • maxTokens (object) – The maximum number of tokens to generate. Has minimum of 0.

  • model (object) – The name of the model to use

  • n (object) – How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

  • outputCol (str) – The name of the output column

  • presencePenalty (object) – How much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

  • prompt (object) – The text to complete

  • stop (object) – A sequence which indicates the end of the current document.

  • subscriptionKey (object) – the API key to use

  • temperature (object) – What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

  • timeout (float) – number of seconds to wait before closing the connection

  • topP (object) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

  • url (str) – Url of the service

  • user (object) – The ID of the end-user, for use in tracking and rate-limiting.

apiVersion = Param(parent='undefined', name='apiVersion', doc='ServiceParam: version of the api')
batchIndexPrompt = Param(parent='undefined', name='batchIndexPrompt', doc='ServiceParam: Sequence of index sequences to complete')
batchPrompt = Param(parent='undefined', name='batchPrompt', doc='ServiceParam: Sequence of prompts to complete')
bestOf = Param(parent='undefined', name='bestOf', doc='ServiceParam: How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.')
cacheLevel = Param(parent='undefined', name='cacheLevel', doc='ServiceParam: can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
deploymentName = Param(parent='undefined', name='deploymentName', doc='ServiceParam: The name of the deployment')
echo = Param(parent='undefined', name='echo', doc='ServiceParam: Echo back the prompt in addition to the completion')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
frequencyPenalty = Param(parent='undefined', name='frequencyPenalty', doc="ServiceParam: How much to penalize new tokens based on whether they appear in the text so far. Increases the model's likelihood to talk about new topics.")
getApiVersion()[source]
Returns

version of the api

Return type

apiVersion

getBatchIndexPrompt()[source]
Returns

Sequence of index sequences to complete

Return type

batchIndexPrompt

getBatchPrompt()[source]
Returns

Sequence of prompts to complete

Return type

batchPrompt

getBestOf()[source]
Returns

How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

Return type

bestOf

getCacheLevel()[source]
Returns

can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

Return type

cacheLevel

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDeploymentName()[source]
Returns

The name of the deployment

Return type

deploymentName

getEcho()[source]
Returns

Echo back the prompt in addition to the completion

Return type

echo

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFrequencyPenalty()[source]
Returns

How much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood to talk about new topics.

Return type

frequencyPenalty

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIndexPrompt()[source]
Returns

Sequence of indexes to complete

Return type

indexPrompt

static getJavaPackage()[source]

Returns package name String.

getLogProbs()[source]
Returns

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

Return type

logProbs

getMaxTokens()[source]
Returns

The maximum number of tokens to generate. Has minimum of 0.

Return type

maxTokens

getModel()[source]
Returns

The name of the model to use

Return type

model

getN()[source]
Returns

How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

Return type

n

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPresencePenalty()[source]
Returns

How much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

Return type

presencePenalty

getPrompt()[source]
Returns

The text to complete

Return type

prompt

getStop()[source]
Returns

A sequence which indicates the end of the current document.

Return type

stop

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTemperature()[source]
Returns

What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

Return type

temperature

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getTopP()[source]
Returns

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

Return type

topP

getUrl()[source]
Returns

Url of the service

Return type

url

getUser()[source]
Returns

The ID of the end-user, for use in tracking and rate-limiting.

Return type

user

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
indexPrompt = Param(parent='undefined', name='indexPrompt', doc='ServiceParam: Sequence of indexes to complete')
logProbs = Param(parent='undefined', name='logProbs', doc='ServiceParam: Include the log probabilities on the `logprobs` most likely tokens, as well the chosen tokens. So for example, if `logprobs` is 10, the API will return a list of the 10 most likely tokens. If `logprobs` is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.')
maxTokens = Param(parent='undefined', name='maxTokens', doc='ServiceParam: The maximum number of tokens to generate. Has minimum of 0.')
model = Param(parent='undefined', name='model', doc='ServiceParam: The name of the model to use')
n = Param(parent='undefined', name='n', doc='ServiceParam: How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
presencePenalty = Param(parent='undefined', name='presencePenalty', doc="ServiceParam: How much to penalize new tokens based on their existing frequency in the text so far. Decreases the model's likelihood to repeat the same line verbatim. Has minimum of -2 and maximum of 2.")
prompt = Param(parent='undefined', name='prompt', doc='ServiceParam: The text to complete')
classmethod read()[source]

Returns an MLReader instance for this class.

setApiVersion(value)[source]
Parameters

apiVersion – version of the api

setApiVersionCol(value)[source]
Parameters

apiVersion – version of the api

setBatchIndexPrompt(value)[source]
Parameters

batchIndexPrompt – Sequence of index sequences to complete

setBatchIndexPromptCol(value)[source]
Parameters

batchIndexPrompt – Sequence of index sequences to complete

setBatchPrompt(value)[source]
Parameters

batchPrompt – Sequence of prompts to complete

setBatchPromptCol(value)[source]
Parameters

batchPrompt – Sequence of prompts to complete

setBestOf(value)[source]
Parameters

bestOf – How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

setBestOfCol(value)[source]
Parameters

bestOf – How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

setCacheLevel(value)[source]
Parameters

cacheLevel – can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

setCacheLevelCol(value)[source]
Parameters

cacheLevel – can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDeploymentName(value)[source]
Parameters

deploymentName – The name of the deployment

setDeploymentNameCol(value)[source]
Parameters

deploymentName – The name of the deployment

setEcho(value)[source]
Parameters

echo – Echo back the prompt in addition to the completion

setEchoCol(value)[source]
Parameters

echo – Echo back the prompt in addition to the completion

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFrequencyPenalty(value)[source]
Parameters

frequencyPenalty – How much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood to talk about new topics.

setFrequencyPenaltyCol(value)[source]
Parameters

frequencyPenalty – How much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood to talk about new topics.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIndexPrompt(value)[source]
Parameters

indexPrompt – Sequence of indexes to complete

setIndexPromptCol(value)[source]
Parameters

indexPrompt – Sequence of indexes to complete

setLogProbs(value)[source]
Parameters

logProbs – Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

setLogProbsCol(value)[source]
Parameters

logProbs – Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

setMaxTokens(value)[source]
Parameters

maxTokens – The maximum number of tokens to generate. Has minimum of 0.

setMaxTokensCol(value)[source]
Parameters

maxTokens – The maximum number of tokens to generate. Has minimum of 0.

setModel(value)[source]
Parameters

model – The name of the model to use

setModelCol(value)[source]
Parameters

model – The name of the model to use

setN(value)[source]
Parameters

n – How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

setNCol(value)[source]
Parameters

n – How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(apiVersion=None, apiVersionCol=None, batchIndexPrompt=None, batchIndexPromptCol=None, batchPrompt=None, batchPromptCol=None, bestOf=None, bestOfCol=None, cacheLevel=None, cacheLevelCol=None, concurrency=1, concurrentTimeout=None, deploymentName=None, deploymentNameCol=None, echo=None, echoCol=None, errorCol='OpenAPICompletion_9fade8566b4a_error', frequencyPenalty=None, frequencyPenaltyCol=None, handler=None, indexPrompt=None, indexPromptCol=None, logProbs=None, logProbsCol=None, maxTokens=None, maxTokensCol=None, model=None, modelCol=None, n=None, nCol=None, outputCol='OpenAPICompletion_9fade8566b4a_output', presencePenalty=None, presencePenaltyCol=None, prompt=None, promptCol=None, stop=None, stopCol=None, subscriptionKey=None, subscriptionKeyCol=None, temperature=None, temperatureCol=None, timeout=60.0, topP=None, topPCol=None, url=None, user=None, userCol=None)[source]

Set the (keyword only) parameters

setPresencePenalty(value)[source]
Parameters

presencePenalty – How much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

setPresencePenaltyCol(value)[source]
Parameters

presencePenalty – How much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

setPrompt(value)[source]
Parameters

prompt – The text to complete

setPromptCol(value)[source]
Parameters

prompt – The text to complete

setServiceName(value)[source]
setStop(value)[source]
Parameters

stop – A sequence which indicates the end of the current document.

setStopCol(value)[source]
Parameters

stop – A sequence which indicates the end of the current document.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTemperature(value)[source]
Parameters

temperature – What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

setTemperatureCol(value)[source]
Parameters

temperature – What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setTopP(value)[source]
Parameters

topP – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

setTopPCol(value)[source]
Parameters

topP – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

setUrl(value)[source]
Parameters

url – Url of the service

setUser(value)[source]
Parameters

user – The ID of the end-user, for use in tracking and rate-limiting.

setUserCol(value)[source]
Parameters

user – The ID of the end-user, for use in tracking and rate-limiting.

stop = Param(parent='undefined', name='stop', doc='ServiceParam: A sequence which indicates the end of the current document.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
temperature = Param(parent='undefined', name='temperature', doc='ServiceParam: What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or `top_p` but not both. Minimum of 0 and maximum of 2 allowed.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
topP = Param(parent='undefined', name='topP', doc='ServiceParam: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend using this or `temperature` but not both. Minimum of 0 and maximum of 1 allowed.')
url = Param(parent='undefined', name='url', doc='Url of the service')
user = Param(parent='undefined', name='user', doc='ServiceParam: The ID of the end-user, for use in tracking and rate-limiting.')

synapse.ml.cognitive.PII module

class synapse.ml.cognitive.PII.PII(java_obj=None, concurrency=1, concurrentTimeout=None, domain=None, domainCol=None, errorCol='PII_de2b592e6f4e_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='PII_de2b592e6f4e_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • domain (object) – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (str) – The name of the output column

  • piiCategories (object) – describes the PII categories to return

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
domain = Param(parent='undefined', name='domain', doc="ServiceParam: if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: 'PHI', 'none'.")
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDomain()[source]
Returns

if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

Return type

domain

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPiiCategories()[source]
Returns

describes the PII categories to return

Return type

piiCategories

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
piiCategories = Param(parent='undefined', name='piiCategories', doc='ServiceParam: describes the PII categories to return')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDomain(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setDomainCol(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, domain=None, domainCol=None, errorCol='PII_de2b592e6f4e_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='PII_de2b592e6f4e_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPiiCategories(value)[source]
Parameters

piiCategories – describes the PII categories to return

setPiiCategoriesCol(value)[source]
Parameters

piiCategories – describes the PII categories to return

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.PIISDK module

class synapse.ml.cognitive.PIISDK.PIISDK(java_obj=None, batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • includeStatistics (object) – includeStatistics option

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – modelVersion option

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeStatistics()[source]
Returns

includeStatistics option

Return type

includeStatistics

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

modelVersion option

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeStatistics = Param(parent='undefined', name='includeStatistics', doc='ServiceParam: includeStatistics option')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: modelVersion option')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeStatistics(value)[source]
Parameters

includeStatistics – includeStatistics option

setIncludeStatisticsCol(value)[source]
Parameters

includeStatistics – includeStatistics option

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – modelVersion option

setModelVersionCol(value)[source]
Parameters

modelVersion – modelVersion option

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.ReadImage module

class synapse.ml.cognitive.ReadImage.ReadImage(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='ReadImage_d7542375d0c7_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, outputCol='ReadImage_d7542375d0c7_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • language (object) – IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
language = Param(parent='undefined', name='language', doc='ServiceParam: IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLanguage(value)[source]
Parameters

language – IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLanguageCol(value)[source]
Parameters

language – IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='ReadImage_d7542375d0c7_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, outputCol='ReadImage_d7542375d0c7_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.RecognizeDomainSpecificContent module

class synapse.ml.cognitive.RecognizeDomainSpecificContent.RecognizeDomainSpecificContent(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='RecognizeDomainSpecificContent_924c6e4f205a_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, model=None, modelCol=None, outputCol='RecognizeDomainSpecificContent_924c6e4f205a_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • model (object) – the domain specific model: celebrities, landmarks

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getModel()[source]
Returns

the domain specific model: celebrities, landmarks

Return type

model

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
model = Param(parent='undefined', name='model', doc='ServiceParam: the domain specific model: celebrities, landmarks')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setModel(value)[source]
Parameters

model – the domain specific model: celebrities, landmarks

setModelCol(value)[source]
Parameters

model – the domain specific model: celebrities, landmarks

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='RecognizeDomainSpecificContent_924c6e4f205a_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, model=None, modelCol=None, outputCol='RecognizeDomainSpecificContent_924c6e4f205a_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.RecognizeText module

class synapse.ml.cognitive.RecognizeText.RecognizeText(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='RecognizeText_b9888bc3c0c3_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, maxPollingRetries=1000, mode=None, modeCol=None, outputCol='RecognizeText_b9888bc3c0c3_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getMode()[source]
Returns

If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

Return type

mode

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
mode = Param(parent='undefined', name='mode', doc="ServiceParam: If this parameter is set to 'Printed', printed text recognition is performed. If 'Handwritten' is specified, handwriting recognition is performed")
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setMode(value)[source]
Parameters

mode – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

setModeCol(value)[source]
Parameters

mode – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='RecognizeText_b9888bc3c0c3_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, maxPollingRetries=1000, mode=None, modeCol=None, outputCol='RecognizeText_b9888bc3c0c3_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.SimpleDetectAnomalies module

class synapse.ml.cognitive.SimpleDetectAnomalies.SimpleDetectAnomalies(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='SimpleDetectAnomalies_0e210d55647d_error', granularity=None, granularityCol=None, groupbyCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='SimpleDetectAnomalies_0e210d55647d_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • groupbyCol (str) – column that groups the series

  • handler (object) – Which strategy to use when handling requests

  • imputeFixedValue (object) – Optional argument, fixed value to use when imputeMode is set to “fixed”

  • imputeMode (object) – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • timestampCol (str) – column representing the time of the series

  • url (str) – Url of the service

  • valueCol (str) – column representing the value of the series

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc='ServiceParam:  Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getGroupbyCol()[source]
Returns

column that groups the series

Return type

groupbyCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImputeFixedValue()[source]
Returns

Optional argument, fixed value to use when imputeMode is set to “fixed”

Return type

imputeFixedValue

getImputeMode()[source]
Returns

Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

Return type

imputeMode

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getTimestampCol()[source]
Returns

column representing the time of the series

Return type

timestampCol

getUrl()[source]
Returns

Url of the service

Return type

url

getValueCol()[source]
Returns

column representing the value of the series

Return type

valueCol

granularity = Param(parent='undefined', name='granularity', doc='ServiceParam:  Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
groupbyCol = Param(parent='undefined', name='groupbyCol', doc='column that groups the series')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imputeFixedValue = Param(parent='undefined', name='imputeFixedValue', doc='ServiceParam:  Optional argument, fixed value to use when imputeMode is set to "fixed"     ')
imputeMode = Param(parent='undefined', name='imputeMode', doc='ServiceParam:  Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill     ')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc='ServiceParam:  Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc='ServiceParam:  Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc='ServiceParam:  Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc='ServiceParam:  Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGroupbyCol(value)[source]
Parameters

groupbyCol – column that groups the series

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImputeFixedValue(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeFixedValueCol(value)[source]
Parameters

imputeFixedValue – Optional argument, fixed value to use when imputeMode is set to “fixed”

setImputeMode(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setImputeModeCol(value)[source]
Parameters

imputeMode – Optional argument, impute mode of a time series. Possible values: auto, previous, linear, fixed, zero, notFill

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='SimpleDetectAnomalies_0e210d55647d_error', granularity=None, granularityCol=None, groupbyCol=None, handler=None, imputeFixedValue=None, imputeFixedValueCol=None, imputeMode=None, imputeModeCol=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='SimpleDetectAnomalies_0e210d55647d_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setTimestampCol(value)[source]
Parameters

timestampCol – column representing the time of the series

setUrl(value)[source]
Parameters

url – Url of the service

setValueCol(value)[source]
Parameters

valueCol – column representing the value of the series

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
timestampCol = Param(parent='undefined', name='timestampCol', doc='column representing the time of the series')
url = Param(parent='undefined', name='url', doc='Url of the service')
valueCol = Param(parent='undefined', name='valueCol', doc='column representing the value of the series')

synapse.ml.cognitive.SpeechToText module

class synapse.ml.cognitive.SpeechToText.SpeechToText(java_obj=None, audioData=None, audioDataCol=None, concurrency=1, concurrentTimeout=None, errorCol='SpeechToText_6c8ebf2ab54f_error', format=None, formatCol=None, handler=None, language=None, languageCol=None, outputCol='SpeechToText_6c8ebf2ab54f_output', profanity=None, profanityCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioData (object) – The data sent to the service must be a .wav files

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • handler (object) – Which strategy to use when handling requests

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (str) – The name of the output column

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

audioData = Param(parent='undefined', name='audioData', doc='ServiceParam:  The data sent to the service must be a .wav files     ')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
format = Param(parent='undefined', name='format', doc='ServiceParam:  Specifies the result format. Accepted values are simple and detailed. Default is simple.     ')
getAudioData()[source]
Returns

The data sent to the service must be a .wav files

Return type

audioData

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

format

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

profanity

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam:  Identifies the spoken language that is being recognized.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
profanity = Param(parent='undefined', name='profanity', doc='ServiceParam:  Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

setAudioData(value)[source]
Parameters

audioData – The data sent to the service must be a .wav files

setAudioDataCol(value)[source]
Parameters

audioData – The data sent to the service must be a .wav files

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFormat(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(audioData=None, audioDataCol=None, concurrency=1, concurrentTimeout=None, errorCol='SpeechToText_6c8ebf2ab54f_error', format=None, formatCol=None, handler=None, language=None, languageCol=None, outputCol='SpeechToText_6c8ebf2ab54f_output', profanity=None, profanityCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setProfanity(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.SpeechToTextSDK module

class synapse.ml.cognitive.SpeechToTextSDK.SpeechToTextSDK(java_obj=None, audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioDataCol (str) – Column holding audio data, must be either ByteArrays or Strings representing file URIs

  • endpointId (str) – endpoint for custom speech models

  • extraFfmpegArgs (list) – extra arguments to for ffmpeg output decoding

  • fileType (object) – The file type of the sound files, supported types: wav, ogg, mp3

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (str) – The name of the output column

  • participantsJson (object) – a json representation of a list of conversation participants (email, language, user)

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • recordAudioData (bool) – Whether to record audio data to a file location, for use only with m3u8 streams

  • recordedFileNameCol (str) – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

  • streamIntermediateResults (bool) – Whether or not to immediately return itermediate results, or group in a sequence

  • subscriptionKey (object) – the API key to use

  • url (str) – Url of the service

audioDataCol = Param(parent='undefined', name='audioDataCol', doc='Column holding audio data, must be either ByteArrays or Strings representing file URIs')
endpointId = Param(parent='undefined', name='endpointId', doc='endpoint for custom speech models')
extraFfmpegArgs = Param(parent='undefined', name='extraFfmpegArgs', doc='extra arguments to for ffmpeg output decoding')
fileType = Param(parent='undefined', name='fileType', doc='ServiceParam: The file type of the sound files, supported types: wav, ogg, mp3')
format = Param(parent='undefined', name='format', doc='ServiceParam:  Specifies the result format. Accepted values are simple and detailed. Default is simple.     ')
getAudioDataCol()[source]
Returns

Column holding audio data, must be either ByteArrays or Strings representing file URIs

Return type

audioDataCol

getEndpointId()[source]
Returns

endpoint for custom speech models

Return type

endpointId

getExtraFfmpegArgs()[source]
Returns

extra arguments to for ffmpeg output decoding

Return type

extraFfmpegArgs

getFileType()[source]
Returns

The file type of the sound files, supported types: wav, ogg, mp3

Return type

fileType

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

format

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getParticipantsJson()[source]
Returns

a json representation of a list of conversation participants (email, language, user)

Return type

participantsJson

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

profanity

getRecordAudioData()[source]
Returns

Whether to record audio data to a file location, for use only with m3u8 streams

Return type

recordAudioData

getRecordedFileNameCol()[source]
Returns

Column holding file names to write audio data to if ``recordAudioData’’ is set to true

Return type

recordedFileNameCol

getStreamIntermediateResults()[source]
Returns

Whether or not to immediately return itermediate results, or group in a sequence

Return type

streamIntermediateResults

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getUrl()[source]
Returns

Url of the service

Return type

url

language = Param(parent='undefined', name='language', doc='ServiceParam:  Identifies the spoken language that is being recognized.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
participantsJson = Param(parent='undefined', name='participantsJson', doc='ServiceParam: a json representation of a list of conversation participants (email, language, user)')
profanity = Param(parent='undefined', name='profanity', doc='ServiceParam:  Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

recordAudioData = Param(parent='undefined', name='recordAudioData', doc='Whether to record audio data to a file location, for use only with m3u8 streams')
recordedFileNameCol = Param(parent='undefined', name='recordedFileNameCol', doc="Column holding file names to write audio data to if ``recordAudioData'' is set to true")
setAudioDataCol(value)[source]
Parameters

audioDataCol – Column holding audio data, must be either ByteArrays or Strings representing file URIs

setEndpointId(value)[source]
Parameters

endpointId – endpoint for custom speech models

setExtraFfmpegArgs(value)[source]
Parameters

extraFfmpegArgs – extra arguments to for ffmpeg output decoding

setFileType(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFileTypeCol(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFormat(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setLanguage(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Set the (keyword only) parameters

setParticipantsJson(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setParticipantsJsonCol(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setProfanity(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setRecordAudioData(value)[source]
Parameters

recordAudioData – Whether to record audio data to a file location, for use only with m3u8 streams

setRecordedFileNameCol(value)[source]
Parameters

recordedFileNameCol – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

setStreamIntermediateResults(value)[source]
Parameters

streamIntermediateResults – Whether or not to immediately return itermediate results, or group in a sequence

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setUrl(value)[source]
Parameters

url – Url of the service

streamIntermediateResults = Param(parent='undefined', name='streamIntermediateResults', doc='Whether or not to immediately return itermediate results, or group in a sequence')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TagImage module

class synapse.ml.cognitive.TagImage.TagImage(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='TagImage_58a3c3e64af1_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='TagImage_58a3c3e64af1_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – The desired language for output generation.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

The desired language for output generation.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
language = Param(parent='undefined', name='language', doc='ServiceParam: The desired language for output generation.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – The desired language for output generation.

setLanguageCol(value)[source]
Parameters

language – The desired language for output generation.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='TagImage_58a3c3e64af1_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='TagImage_58a3c3e64af1_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TextAnalyze module

class synapse.ml.cognitive.TextAnalyze.TextAnalyze(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, entityLinkingTasks=[], entityRecognitionPiiTasks=[], entityRecognitionTasks=[], errorCol='TextAnalyze_1d5d6204eac4_error', initialPollingDelay=300, keyPhraseExtractionTasks=[], language=None, languageCol=None, maxPollingRetries=1000, outputCol='TextAnalyze_1d5d6204eac4_output', pollingDelay=300, sentimentAnalysisTasks=[], subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • entityLinkingTasks (object) – the entity linking tasks to perform on submitted documents

  • entityRecognitionPiiTasks (object) – the entity recognition pii tasks to perform on submitted documents

  • entityRecognitionTasks (object) – the entity recognition tasks to perform on submitted documents

  • errorCol (str) – column to hold http errors

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • keyPhraseExtractionTasks (object) – the key phrase extraction tasks to perform on submitted documents

  • language (object) – the language code of the text (optional for some services)

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • sentimentAnalysisTasks (object) – the sentiment analysis tasks to perform on submitted documents

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
entityLinkingTasks = Param(parent='undefined', name='entityLinkingTasks', doc='the entity linking tasks to perform on submitted documents')
entityRecognitionPiiTasks = Param(parent='undefined', name='entityRecognitionPiiTasks', doc='the entity recognition pii tasks to perform on submitted documents')
entityRecognitionTasks = Param(parent='undefined', name='entityRecognitionTasks', doc='the entity recognition tasks to perform on submitted documents')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getEntityLinkingTasks()[source]
Returns

the entity linking tasks to perform on submitted documents

Return type

entityLinkingTasks

getEntityRecognitionPiiTasks()[source]
Returns

the entity recognition pii tasks to perform on submitted documents

Return type

entityRecognitionPiiTasks

getEntityRecognitionTasks()[source]
Returns

the entity recognition tasks to perform on submitted documents

Return type

entityRecognitionTasks

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getKeyPhraseExtractionTasks()[source]
Returns

the key phrase extraction tasks to perform on submitted documents

Return type

keyPhraseExtractionTasks

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSentimentAnalysisTasks()[source]
Returns

the sentiment analysis tasks to perform on submitted documents

Return type

sentimentAnalysisTasks

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesExceededException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesExceededException

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
keyPhraseExtractionTasks = Param(parent='undefined', name='keyPhraseExtractionTasks', doc='the key phrase extraction tasks to perform on submitted documents')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

sentimentAnalysisTasks = Param(parent='undefined', name='sentimentAnalysisTasks', doc='the sentiment analysis tasks to perform on submitted documents')
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setEntityLinkingTasks(value)[source]
Parameters

entityLinkingTasks – the entity linking tasks to perform on submitted documents

setEntityRecognitionPiiTasks(value)[source]
Parameters

entityRecognitionPiiTasks – the entity recognition pii tasks to perform on submitted documents

setEntityRecognitionTasks(value)[source]
Parameters

entityRecognitionTasks – the entity recognition tasks to perform on submitted documents

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setKeyPhraseExtractionTasks(value)[source]
Parameters

keyPhraseExtractionTasks – the key phrase extraction tasks to perform on submitted documents

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, entityLinkingTasks=[], entityRecognitionPiiTasks=[], entityRecognitionTasks=[], errorCol='TextAnalyze_1d5d6204eac4_error', initialPollingDelay=300, keyPhraseExtractionTasks=[], language=None, languageCol=None, maxPollingRetries=1000, outputCol='TextAnalyze_1d5d6204eac4_output', pollingDelay=300, sentimentAnalysisTasks=[], subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSentimentAnalysisTasks(value)[source]
Parameters

sentimentAnalysisTasks – the sentiment analysis tasks to perform on submitted documents

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesExceededException(value)[source]
Parameters

suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TextSentiment module

class synapse.ml.cognitive.TextSentiment.TextSentiment(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='TextSentiment_2de9769f5a5b_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='TextSentiment_2de9769f5a5b_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • opinionMining (object) – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

  • outputCol (str) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOpinionMining()[source]
Returns

if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

Return type

opinionMining

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
opinionMining = Param(parent='undefined', name='opinionMining', doc='ServiceParam: if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOpinionMining(value)[source]
Parameters

opinionMining – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

setOpinionMiningCol(value)[source]
Parameters

opinionMining – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='TextSentiment_2de9769f5a5b_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='TextSentiment_2de9769f5a5b_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TextSentimentSDK module

class synapse.ml.cognitive.TextSentimentSDK.TextSentimentSDK(java_obj=None, batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeOpinionMining=None, includeOpinionMiningCol=None, includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • includeOpinionMining (object) – includeOpinionMining option

  • includeStatistics (object) – includeStatistics option

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – modelVersion option

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeOpinionMining()[source]
Returns

includeOpinionMining option

Return type

includeOpinionMining

getIncludeStatistics()[source]
Returns

includeStatistics option

Return type

includeStatistics

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

modelVersion option

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeOpinionMining = Param(parent='undefined', name='includeOpinionMining', doc='ServiceParam: includeOpinionMining option')
includeStatistics = Param(parent='undefined', name='includeStatistics', doc='ServiceParam: includeStatistics option')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: modelVersion option')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeOpinionMining(value)[source]
Parameters

includeOpinionMining – includeOpinionMining option

setIncludeOpinionMiningCol(value)[source]
Parameters

includeOpinionMining – includeOpinionMining option

setIncludeStatistics(value)[source]
Parameters

includeStatistics – includeStatistics option

setIncludeStatisticsCol(value)[source]
Parameters

includeStatistics – includeStatistics option

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – modelVersion option

setModelVersionCol(value)[source]
Parameters

modelVersion – modelVersion option

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(batchSize=5, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='Error', includeOpinionMining=None, includeOpinionMiningCol=None, includeStatistics=None, includeStatisticsCol=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TextSentimentV2 module

class synapse.ml.cognitive.TextSentimentV2.TextSentimentV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='TextSentimentV2_a0a158ac81a0_error', handler=None, language=None, languageCol=None, outputCol='TextSentimentV2_a0a158ac81a0_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='TextSentimentV2_a0a158ac81a0_error', handler=None, language=None, languageCol=None, outputCol='TextSentimentV2_a0a158ac81a0_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TextToSpeech module

class synapse.ml.cognitive.TextToSpeech.TextToSpeech(java_obj=None, errorCol='TextToSpeech_d11ce307e91e_errors', language=None, languageCol=None, locale=None, localeCol=None, outputFileCol=None, outputFormat=None, outputFormatCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, url=None, voiceName=None, voiceNameCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • errorCol (str) – column to hold http errors

  • language (object) – The name of the language used for synthesis

  • locale (object) – The locale of the input text

  • outputFileCol (str) – The location of the saved file as an HDFS compliant URI

  • outputFormat (object) – The format for the output audio can be one of ArraySeq(Raw8Khz8BitMonoMULaw, Riff16Khz16KbpsMonoSiren, Audio16Khz16KbpsMonoSiren, Audio16Khz32KBitRateMonoMp3, Audio16Khz128KBitRateMonoMp3, Audio16Khz64KBitRateMonoMp3, Audio24Khz48KBitRateMonoMp3, Audio24Khz96KBitRateMonoMp3, Audio24Khz160KBitRateMonoMp3, Raw16Khz16BitMonoTrueSilk, Riff16Khz16BitMonoPcm, Riff8Khz16BitMonoPcm, Riff24Khz16BitMonoPcm, Riff8Khz8BitMonoMULaw, Raw16Khz16BitMonoPcm, Raw24Khz16BitMonoPcm, Raw8Khz16BitMonoPcm, Ogg16Khz16BitMonoOpus, Ogg24Khz16BitMonoOpus)

  • subscriptionKey (object) – the API key to use

  • text (object) – The text to synthesize

  • url (str) – Url of the service

  • voiceName (object) – The name of the voice used for synthesis

errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

The name of the language used for synthesis

Return type

language

getLocale()[source]
Returns

The locale of the input text

Return type

locale

getOutputFileCol()[source]
Returns

The location of the saved file as an HDFS compliant URI

Return type

outputFileCol

getOutputFormat()[source]
Returns

The format for the output audio can be one of ArraySeq(Raw8Khz8BitMonoMULaw, Riff16Khz16KbpsMonoSiren, Audio16Khz16KbpsMonoSiren, Audio16Khz32KBitRateMonoMp3, Audio16Khz128KBitRateMonoMp3, Audio16Khz64KBitRateMonoMp3, Audio24Khz48KBitRateMonoMp3, Audio24Khz96KBitRateMonoMp3, Audio24Khz160KBitRateMonoMp3, Raw16Khz16BitMonoTrueSilk, Riff16Khz16BitMonoPcm, Riff8Khz16BitMonoPcm, Riff24Khz16BitMonoPcm, Riff8Khz8BitMonoMULaw, Raw16Khz16BitMonoPcm, Raw24Khz16BitMonoPcm, Raw8Khz16BitMonoPcm, Ogg16Khz16BitMonoOpus, Ogg24Khz16BitMonoOpus)

Return type

outputFormat

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

The text to synthesize

Return type

text

getUrl()[source]
Returns

Url of the service

Return type

url

getVoiceName()[source]
Returns

The name of the voice used for synthesis

Return type

voiceName

language = Param(parent='undefined', name='language', doc='ServiceParam: The name of the language used for synthesis')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: The locale of the input text')
outputFileCol = Param(parent='undefined', name='outputFileCol', doc='The location of the saved file as an HDFS compliant URI')
outputFormat = Param(parent='undefined', name='outputFormat', doc='ServiceParam: The format for the output audio can be one of ArraySeq(Raw8Khz8BitMonoMULaw, Riff16Khz16KbpsMonoSiren, Audio16Khz16KbpsMonoSiren, Audio16Khz32KBitRateMonoMp3, Audio16Khz128KBitRateMonoMp3, Audio16Khz64KBitRateMonoMp3, Audio24Khz48KBitRateMonoMp3, Audio24Khz96KBitRateMonoMp3, Audio24Khz160KBitRateMonoMp3, Raw16Khz16BitMonoTrueSilk, Riff16Khz16BitMonoPcm, Riff8Khz16BitMonoPcm, Riff24Khz16BitMonoPcm, Riff8Khz8BitMonoMULaw, Raw16Khz16BitMonoPcm, Raw24Khz16BitMonoPcm, Raw8Khz16BitMonoPcm, Ogg16Khz16BitMonoOpus, Ogg24Khz16BitMonoOpus)')
classmethod read()[source]

Returns an MLReader instance for this class.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setLanguage(value)[source]
Parameters

language – The name of the language used for synthesis

setLanguageCol(value)[source]
Parameters

language – The name of the language used for synthesis

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – The locale of the input text

setLocaleCol(value)[source]
Parameters

locale – The locale of the input text

setLocation(value)[source]
setOutputFileCol(value)[source]
Parameters

outputFileCol – The location of the saved file as an HDFS compliant URI

setOutputFormat(value)[source]
Parameters

outputFormat – The format for the output audio can be one of ArraySeq(Raw8Khz8BitMonoMULaw, Riff16Khz16KbpsMonoSiren, Audio16Khz16KbpsMonoSiren, Audio16Khz32KBitRateMonoMp3, Audio16Khz128KBitRateMonoMp3, Audio16Khz64KBitRateMonoMp3, Audio24Khz48KBitRateMonoMp3, Audio24Khz96KBitRateMonoMp3, Audio24Khz160KBitRateMonoMp3, Raw16Khz16BitMonoTrueSilk, Riff16Khz16BitMonoPcm, Riff8Khz16BitMonoPcm, Riff24Khz16BitMonoPcm, Riff8Khz8BitMonoMULaw, Raw16Khz16BitMonoPcm, Raw24Khz16BitMonoPcm, Raw8Khz16BitMonoPcm, Ogg16Khz16BitMonoOpus, Ogg24Khz16BitMonoOpus)

setOutputFormatCol(value)[source]
Parameters

outputFormat – The format for the output audio can be one of ArraySeq(Raw8Khz8BitMonoMULaw, Riff16Khz16KbpsMonoSiren, Audio16Khz16KbpsMonoSiren, Audio16Khz32KBitRateMonoMp3, Audio16Khz128KBitRateMonoMp3, Audio16Khz64KBitRateMonoMp3, Audio24Khz48KBitRateMonoMp3, Audio24Khz96KBitRateMonoMp3, Audio24Khz160KBitRateMonoMp3, Raw16Khz16BitMonoTrueSilk, Riff16Khz16BitMonoPcm, Riff8Khz16BitMonoPcm, Riff24Khz16BitMonoPcm, Riff8Khz8BitMonoMULaw, Raw16Khz16BitMonoPcm, Raw24Khz16BitMonoPcm, Raw8Khz16BitMonoPcm, Ogg16Khz16BitMonoOpus, Ogg24Khz16BitMonoOpus)

setParams(errorCol='TextToSpeech_d11ce307e91e_errors', language=None, languageCol=None, locale=None, localeCol=None, outputFileCol=None, outputFormat=None, outputFormatCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, url=None, voiceName=None, voiceNameCol=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – The text to synthesize

setTextCol(value)[source]
Parameters

text – The text to synthesize

setUrl(value)[source]
Parameters

url – Url of the service

setVoiceName(value)[source]
Parameters

voiceName – The name of the voice used for synthesis

setVoiceNameCol(value)[source]
Parameters

voiceName – The name of the voice used for synthesis

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: The text to synthesize')
url = Param(parent='undefined', name='url', doc='Url of the service')
voiceName = Param(parent='undefined', name='voiceName', doc='ServiceParam: The name of the voice used for synthesis')

synapse.ml.cognitive.Translate module

class synapse.ml.cognitive.Translate.Translate(java_obj=None, allowFallback=None, allowFallbackCol=None, category=None, categoryCol=None, concurrency=1, concurrentTimeout=None, errorCol='Translate_b910fdcd035e_error', fromLanguage=None, fromLanguageCol=None, fromScript=None, fromScriptCol=None, handler=None, includeAlignment=None, includeAlignmentCol=None, includeSentenceLength=None, includeSentenceLengthCol=None, outputCol='Translate_b910fdcd035e_output', profanityAction=None, profanityActionCol=None, profanityMarker=None, profanityMarkerCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, suggestedFrom=None, suggestedFromCol=None, text=None, textCol=None, textType=None, textTypeCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, toScript=None, toScriptCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • allowFallback (object) – Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

  • category (object) – A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • fromLanguage (object) – Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

  • fromScript (object) – Specifies the script of the input text.

  • handler (object) – Which strategy to use when handling requests

  • includeAlignment (object) – Specifies whether to include alignment projection from source text to translated text.

  • includeSentenceLength (object) – Specifies whether to include sentence boundaries for the input text and the translated text.

  • outputCol (str) – The name of the output column

  • profanityAction (object) – Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

  • profanityMarker (object) – Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • suggestedFrom (object) – Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

  • text (object) – the string to translate

  • textType (object) – Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

  • timeout (float) – number of seconds to wait before closing the connection

  • toLanguage (object) – Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de and to=it to translate to German and Italian.

  • toScript (object) – Specifies the script of the translated text.

  • url (str) – Url of the service

allowFallback = Param(parent='undefined', name='allowFallback', doc='ServiceParam: Specifies that the service is allowed to fall back to a general system when a custom system does not exist. ')
category = Param(parent='undefined', name='category', doc='ServiceParam: A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromLanguage = Param(parent='undefined', name='fromLanguage', doc='ServiceParam: Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.')
fromScript = Param(parent='undefined', name='fromScript', doc='ServiceParam: Specifies the script of the input text.')
getAllowFallback()[source]
Returns

Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

Return type

allowFallback

getCategory()[source]
Returns

A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

Return type

category

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromLanguage()[source]
Returns

Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

Return type

fromLanguage

getFromScript()[source]
Returns

Specifies the script of the input text.

Return type

fromScript

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIncludeAlignment()[source]
Returns

Specifies whether to include alignment projection from source text to translated text.

Return type

includeAlignment

getIncludeSentenceLength()[source]
Returns

Specifies whether to include sentence boundaries for the input text and the translated text.

Return type

includeSentenceLength

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getProfanityAction()[source]
Returns

Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

Return type

profanityAction

getProfanityMarker()[source]
Returns

Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

Return type

profanityMarker

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getSuggestedFrom()[source]
Returns

Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

Return type

suggestedFrom

getText()[source]
Returns

the string to translate

Return type

text

getTextType()[source]
Returns

Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

Return type

textType

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToLanguage()[source]
Returns

Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de and to=it to translate to German and Italian.

Return type

toLanguage

getToScript()[source]
Returns

Specifies the script of the translated text.

Return type

toScript

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
includeAlignment = Param(parent='undefined', name='includeAlignment', doc='ServiceParam: Specifies whether to include alignment projection from source text to translated text.')
includeSentenceLength = Param(parent='undefined', name='includeSentenceLength', doc='ServiceParam: Specifies whether to include sentence boundaries for the input text and the translated text. ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
profanityAction = Param(parent='undefined', name='profanityAction', doc='ServiceParam: Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted. ')
profanityMarker = Param(parent='undefined', name='profanityMarker', doc='ServiceParam: Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.')
classmethod read()[source]

Returns an MLReader instance for this class.

setAllowFallback(value)[source]
Parameters

allowFallback – Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

setAllowFallbackCol(value)[source]
Parameters

allowFallback – Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

setCategory(value)[source]
Parameters

category – A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

setCategoryCol(value)[source]
Parameters

category – A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromLanguage(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

setFromLanguageCol(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

setFromScript(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setFromScriptCol(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIncludeAlignment(value)[source]
Parameters

includeAlignment – Specifies whether to include alignment projection from source text to translated text.

setIncludeAlignmentCol(value)[source]
Parameters

includeAlignment – Specifies whether to include alignment projection from source text to translated text.

setIncludeSentenceLength(value)[source]
Parameters

includeSentenceLength – Specifies whether to include sentence boundaries for the input text and the translated text.

setIncludeSentenceLengthCol(value)[source]
Parameters

includeSentenceLength – Specifies whether to include sentence boundaries for the input text and the translated text.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(allowFallback=None, allowFallbackCol=None, category=None, categoryCol=None, concurrency=1, concurrentTimeout=None, errorCol='Translate_b910fdcd035e_error', fromLanguage=None, fromLanguageCol=None, fromScript=None, fromScriptCol=None, handler=None, includeAlignment=None, includeAlignmentCol=None, includeSentenceLength=None, includeSentenceLengthCol=None, outputCol='Translate_b910fdcd035e_output', profanityAction=None, profanityActionCol=None, profanityMarker=None, profanityMarkerCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, suggestedFrom=None, suggestedFromCol=None, text=None, textCol=None, textType=None, textTypeCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, toScript=None, toScriptCol=None, url=None)[source]

Set the (keyword only) parameters

setProfanityAction(value)[source]
Parameters

profanityAction – Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

setProfanityActionCol(value)[source]
Parameters

profanityAction – Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

setProfanityMarker(value)[source]
Parameters

profanityMarker – Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

setProfanityMarkerCol(value)[source]
Parameters

profanityMarker – Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setSuggestedFrom(value)[source]
Parameters

suggestedFrom – Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

setSuggestedFromCol(value)[source]
Parameters

suggestedFrom – Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTextType(value)[source]
Parameters

textType – Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

setTextTypeCol(value)[source]
Parameters

textType – Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToLanguage(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de and to=it to translate to German and Italian.

setToLanguageCol(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de and to=it to translate to German and Italian.

setToScript(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setToScriptCol(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='ServiceParam: the API region to use')
suggestedFrom = Param(parent='undefined', name='suggestedFrom', doc="ServiceParam: Specifies a fallback language if the language of the input text can't be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.")
text = Param(parent='undefined', name='text', doc='ServiceParam: the string to translate')
textType = Param(parent='undefined', name='textType', doc='ServiceParam: Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toLanguage = Param(parent='undefined', name='toLanguage', doc="ServiceParam: Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It's possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de and to=it to translate to German and Italian.")
toScript = Param(parent='undefined', name='toScript', doc='ServiceParam: Specifies the script of the translated text.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.Transliterate module

class synapse.ml.cognitive.Transliterate.Transliterate(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='Transliterate_0f6748921d49_error', fromScript=None, fromScriptCol=None, handler=None, language=None, languageCol=None, outputCol='Transliterate_0f6748921d49_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toScript=None, toScriptCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • fromScript (object) – Specifies the script of the input text.

  • handler (object) – Which strategy to use when handling requests

  • language (object) – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • toScript (object) – Specifies the script of the translated text.

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromScript = Param(parent='undefined', name='fromScript', doc='ServiceParam: Specifies the script of the input text.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromScript()[source]
Returns

Specifies the script of the input text.

Return type

fromScript

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToScript()[source]
Returns

Specifies the script of the translated text.

Return type

toScript

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromScript(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setFromScriptCol(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLanguageCol(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='Transliterate_0f6748921d49_error', fromScript=None, fromScriptCol=None, handler=None, language=None, languageCol=None, outputCol='Transliterate_0f6748921d49_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toScript=None, toScriptCol=None, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToScript(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setToScriptCol(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='ServiceParam: the API region to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toScript = Param(parent='undefined', name='toScript', doc='ServiceParam: Specifies the script of the translated text.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.VerifyFaces module

class synapse.ml.cognitive.VerifyFaces.VerifyFaces(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='VerifyFaces_e076a02a2b51_error', faceId=None, faceIdCol=None, faceId1=None, faceId1Col=None, faceId2=None, faceId2Col=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, outputCol='VerifyFaces_e076a02a2b51_output', personGroupId=None, personGroupIdCol=None, personId=None, personIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • faceId (object) – faceId of the face, comes from Face - Detect.

  • faceId1 (object) – faceId of one face, comes from Face - Detect.

  • faceId2 (object) – faceId of another face, comes from Face - Detect.

  • handler (object) – Which strategy to use when handling requests

  • largePersonGroupId (object) – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • outputCol (str) – The name of the output column

  • personGroupId (object) – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • personId (object) – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceId = Param(parent='undefined', name='faceId', doc='ServiceParam: faceId of the face, comes from Face - Detect.')
faceId1 = Param(parent='undefined', name='faceId1', doc='ServiceParam: faceId of one face, comes from Face - Detect.')
faceId2 = Param(parent='undefined', name='faceId2', doc='ServiceParam: faceId of another face, comes from Face - Detect.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceId()[source]
Returns

faceId of the face, comes from Face - Detect.

Return type

faceId

getFaceId1()[source]
Returns

faceId of one face, comes from Face - Detect.

Return type

faceId1

getFaceId2()[source]
Returns

faceId of another face, comes from Face - Detect.

Return type

faceId2

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLargePersonGroupId()[source]
Returns

Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

largePersonGroupId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPersonGroupId()[source]
Returns

Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

personGroupId

getPersonId()[source]
Returns

Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

Return type

personId

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
largePersonGroupId = Param(parent='undefined', name='largePersonGroupId', doc='ServiceParam: Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
personGroupId = Param(parent='undefined', name='personGroupId', doc='ServiceParam: Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
personId = Param(parent='undefined', name='personId', doc='ServiceParam: Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceId(value)[source]
Parameters

faceId – faceId of the face, comes from Face - Detect.

setFaceId1(value)[source]
Parameters

faceId1 – faceId of one face, comes from Face - Detect.

setFaceId1Col(value)[source]
Parameters

faceId1 – faceId of one face, comes from Face - Detect.

setFaceId2(value)[source]
Parameters

faceId2 – faceId of another face, comes from Face - Detect.

setFaceId2Col(value)[source]
Parameters

faceId2 – faceId of another face, comes from Face - Detect.

setFaceIdCol(value)[source]
Parameters

faceId – faceId of the face, comes from Face - Detect.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLargePersonGroupId(value)[source]
Parameters

largePersonGroupId – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLargePersonGroupIdCol(value)[source]
Parameters

largePersonGroupId – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='VerifyFaces_e076a02a2b51_error', faceId=None, faceIdCol=None, faceId1=None, faceId1Col=None, faceId2=None, faceId2Col=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, outputCol='VerifyFaces_e076a02a2b51_output', personGroupId=None, personGroupIdCol=None, personId=None, personIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPersonGroupId(value)[source]
Parameters

personGroupId – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonGroupIdCol(value)[source]
Parameters

personGroupId – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonId(value)[source]
Parameters

personId – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

setPersonIdCol(value)[source]
Parameters

personId – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.