synapse.ml.cognitive package

Submodules

synapse.ml.cognitive.AddDocuments module

class synapse.ml.cognitive.AddDocuments.AddDocuments(java_obj=None, actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_563f8874fe32_error', handler=None, indexName=None, outputCol='AddDocuments_563f8874fe32_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • actionCol (object) – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • indexName (object) –

  • outputCol (object) – The name of the output column

  • serviceName (object) –

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

actionCol = Param(parent='undefined', name='actionCol', doc=" You can combine actions, such as an upload and a delete, in the same batch.  upload: An upload action is similar to an 'upsert' where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case.  merge: Merge updates an existing document with the specified fields. If the document doesn't exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field 'tags' with value ['budget'] and you execute a merge with value ['economy', 'pool'] for 'tags', the final value of the 'tags' field will be ['economy', 'pool'].  It will not be ['budget', 'economy', 'pool'].  mergeOrUpload: This action behaves like merge if a document  with the given key already exists in the index.  If the document does not exist, it behaves like upload with a new document.  delete: Delete removes the specified document from the index.  Note that any field you specify in a delete operation,  other than the key field, will be ignored. If you want to   remove an individual field from a document, use merge   instead and simply set the field explicitly to null.     ")
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getActionCol()[source]
Returns

You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

Return type

actionCol

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIndexName()[source]
Returns

Return type

indexName

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getServiceName()[source]
Returns

Return type

serviceName

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
indexName = Param(parent='undefined', name='indexName', doc='')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

serviceName = Param(parent='undefined', name='serviceName', doc='')
setActionCol(value)[source]
Parameters

actionCol – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIndexName(value)[source]
Parameters

indexName

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_563f8874fe32_error', handler=None, indexName=None, outputCol='AddDocuments_563f8874fe32_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setServiceName(value)[source]
Parameters

serviceName

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeBusinessCards module

class synapse.ml.cognitive.AnalyzeBusinessCards.AnalyzeBusinessCards(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_05788d7888b8_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeBusinessCards_05788d7888b8_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
locale = Param(parent='undefined', name='locale', doc='Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_05788d7888b8_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeBusinessCards_05788d7888b8_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeCustomModel module

class synapse.ml.cognitive.AnalyzeCustomModel.AnalyzeCustomModel(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_693f2f3c57bc_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, maxPollingRetries=1000, modelId=None, modelIdCol=None, outputCol='AnalyzeCustomModel_693f2f3c57bc_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • maxPollingRetries (int) – number of times to poll

  • modelId (object) – Model identifier.

  • outputCol (object) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelId()[source]
Returns

Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelId = Param(parent='undefined', name='modelId', doc='Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelId(value)[source]
Parameters

modelId – Model identifier.

setModelIdCol(value)[source]
Parameters

modelId – Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_693f2f3c57bc_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, maxPollingRetries=1000, modelId=None, modelIdCol=None, outputCol='AnalyzeCustomModel_693f2f3c57bc_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeIDDocuments module

class synapse.ml.cognitive.AnalyzeIDDocuments.AnalyzeIDDocuments(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_ff604c3cf01c_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, maxPollingRetries=1000, outputCol='AnalyzeIDDocuments_ff604c3cf01c_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_ff604c3cf01c_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, maxPollingRetries=1000, outputCol='AnalyzeIDDocuments_ff604c3cf01c_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeImage module

class synapse.ml.cognitive.AnalyzeImage.AnalyzeImage(java_obj=None, concurrency=1, concurrentTimeout=None, details=None, detailsCol=None, errorCol='AnalyzeImage_eb41c458b46d_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='AnalyzeImage_eb41c458b46d_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, visualFeatures=None, visualFeaturesCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • details (object) – what visual feature types to return

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language of the response (en if none given)

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

  • visualFeatures (object) – what visual feature types to return

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
details = Param(parent='undefined', name='details', doc='what visual feature types to return')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDetails()[source]
Returns

what visual feature types to return

Return type

details

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language of the response (en if none given)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

getVisualFeatures()[source]
Returns

what visual feature types to return

Return type

visualFeatures

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='the language of the response (en if none given)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDetails(value)[source]
Parameters

details – what visual feature types to return

setDetailsCol(value)[source]
Parameters

details – what visual feature types to return

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – the language of the response (en if none given)

setLanguageCol(value)[source]
Parameters

language – the language of the response (en if none given)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, details=None, detailsCol=None, errorCol='AnalyzeImage_eb41c458b46d_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='AnalyzeImage_eb41c458b46d_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, visualFeatures=None, visualFeaturesCol=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

setVisualFeatures(value)[source]
Parameters

visualFeatures – what visual feature types to return

setVisualFeaturesCol(value)[source]
Parameters

visualFeatures – what visual feature types to return

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')
visualFeatures = Param(parent='undefined', name='visualFeatures', doc='what visual feature types to return')

synapse.ml.cognitive.AnalyzeInvoices module

class synapse.ml.cognitive.AnalyzeInvoices.AnalyzeInvoices(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_9951e78e002e_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeInvoices_9951e78e002e_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
locale = Param(parent='undefined', name='locale', doc='Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_9951e78e002e_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeInvoices_9951e78e002e_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeLayout module

class synapse.ml.cognitive.AnalyzeLayout.AnalyzeLayout(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_dd2187eeebe3_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxPollingRetries=1000, outputCol='AnalyzeLayout_dd2187eeebe3_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • readingOrder (object) – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getReadingOrder()[source]
Returns

Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

Return type

readingOrder

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

readingOrder = Param(parent='undefined', name='readingOrder', doc="Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either 'basic' or 'natural'. Will default to basic if not specified")
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLanguageCol(value)[source]
Parameters

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_dd2187eeebe3_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxPollingRetries=1000, outputCol='AnalyzeLayout_dd2187eeebe3_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setReadingOrder(value)[source]
Parameters

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setReadingOrderCol(value)[source]
Parameters

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AnalyzeReceipts module

class synapse.ml.cognitive.AnalyzeReceipts.AnalyzeReceipts(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_89a2a29b1819_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeReceipts_89a2a29b1819_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

getIncludeTextDetails()[source]
Returns

Include text lines and element references in the result.

Return type

includeTextDetails

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type

locale

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPages()[source]
Returns

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type

pages

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='Include text lines and element references in the result.')
locale = Param(parent='undefined', name='locale', doc='Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed & e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters

includeTextDetails – Include text lines and element references in the result.

setLinkedService(value)[source]
setLocale(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setPages(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed & e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_89a2a29b1819_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeReceipts_89a2a29b1819_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.AzureSearchWriter module

synapse.ml.cognitive.AzureSearchWriter.streamToAzureSearch(df, **options)[source]
synapse.ml.cognitive.AzureSearchWriter.writeToAzureSearch(df, **options)[source]

synapse.ml.cognitive.BingImageSearch module

class synapse.ml.cognitive.BingImageSearch.BingImageSearch(java_obj=None, aspect=None, aspectCol=None, color=None, colorCol=None, concurrency=1, concurrentTimeout=None, count=None, countCol=None, errorCol='BingImageSearch_f4e87eb5757a_error', freshness=None, freshnessCol=None, handler=None, height=None, heightCol=None, imageContent=None, imageContentCol=None, imageType=None, imageTypeCol=None, license=None, licenseCol=None, maxFileSize=None, maxFileSizeCol=None, maxHeight=None, maxHeightCol=None, maxWidth=None, maxWidthCol=None, minFileSize=None, minFileSizeCol=None, minHeight=None, minHeightCol=None, minWidth=None, minWidthCol=None, mkt=None, mktCol=None, offset=None, offsetCol=None, outputCol='BingImageSearch_f4e87eb5757a_output', q=None, qCol=None, size=None, sizeCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url='https://api.bing.microsoft.com/v7.0/images/search', width=None, widthCol=None)[source]

Bases: synapse.ml.cognitive._BingImageSearch._BingImageSearch

static downloadFromUrls(pathCol, bytesCol, concurrency, timeout)[source]
static getUrlTransformer(imageCol, urlCol)[source]
setMarket(value)[source]
setMarketCol(value)[source]
setQuery(value)[source]
setQueryCol(value)[source]

synapse.ml.cognitive.BreakSentence module

class synapse.ml.cognitive.BreakSentence.BreakSentence(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='BreakSentence_c8ee7040de78_error', handler=None, language=None, languageCol=None, outputCol='BreakSentence_c8ee7040de78_output', script=None, scriptCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

  • outputCol (object) – The name of the output column

  • script (object) – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getScript()[source]
Returns

Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

Return type

script

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

script = Param(parent='undefined', name='script', doc='Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLanguageCol(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='BreakSentence_c8ee7040de78_error', handler=None, language=None, languageCol=None, outputCol='BreakSentence_c8ee7040de78_output', script=None, scriptCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setScript(value)[source]
Parameters

script – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

setScriptCol(value)[source]
Parameters

script – Script tag identifying the script used by the input text. If a script is not specified, the default script of the language will be assumed.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
text = Param(parent='undefined', name='text', doc='the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.ConversationTranscription module

class synapse.ml.cognitive.ConversationTranscription.ConversationTranscription(java_obj=None, audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioDataCol (object) – Column holding audio data, must be either ByteArrays or Strings representing file URIs

  • endpointId (object) – endpoint for custom speech models

  • extraFfmpegArgs (list) – extra arguments to for ffmpeg output decoding

  • fileType (object) – The file type of the sound files, supported types: wav, ogg, mp3

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (object) – The name of the output column

  • participantsJson (object) – a json representation of a list of conversation participants (email, language, user)

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • recordAudioData (bool) – Whether to record audio data to a file location, for use only with m3u8 streams

  • recordedFileNameCol (object) – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

  • streamIntermediateResults (bool) – Whether or not to immediately return itermediate results, or group in a sequence

  • subscriptionKey (object) – the API key to use

  • url (object) – Url of the service

audioDataCol = Param(parent='undefined', name='audioDataCol', doc='Column holding audio data, must be either ByteArrays or Strings representing file URIs')
endpointId = Param(parent='undefined', name='endpointId', doc='endpoint for custom speech models')
extraFfmpegArgs = Param(parent='undefined', name='extraFfmpegArgs', doc='extra arguments to for ffmpeg output decoding')
fileType = Param(parent='undefined', name='fileType', doc='The file type of the sound files, supported types: wav, ogg, mp3')
format = Param(parent='undefined', name='format', doc=' Specifies the result format. Accepted values are simple and detailed. Default is simple.     ')
getAudioDataCol()[source]
Returns

Column holding audio data, must be either ByteArrays or Strings representing file URIs

Return type

audioDataCol

getEndpointId()[source]
Returns

endpoint for custom speech models

Return type

endpointId

getExtraFfmpegArgs()[source]
Returns

extra arguments to for ffmpeg output decoding

Return type

extraFfmpegArgs

getFileType()[source]
Returns

The file type of the sound files, supported types: wav, ogg, mp3

Return type

fileType

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

format

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getParticipantsJson()[source]
Returns

a json representation of a list of conversation participants (email, language, user)

Return type

participantsJson

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

profanity

getRecordAudioData()[source]
Returns

Whether to record audio data to a file location, for use only with m3u8 streams

Return type

recordAudioData

getRecordedFileNameCol()[source]
Returns

Column holding file names to write audio data to if ``recordAudioData’’ is set to true

Return type

recordedFileNameCol

getStreamIntermediateResults()[source]
Returns

Whether or not to immediately return itermediate results, or group in a sequence

Return type

streamIntermediateResults

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getUrl()[source]
Returns

Url of the service

Return type

url

language = Param(parent='undefined', name='language', doc=' Identifies the spoken language that is being recognized.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
participantsJson = Param(parent='undefined', name='participantsJson', doc='a json representation of a list of conversation participants (email, language, user)')
profanity = Param(parent='undefined', name='profanity', doc=' Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

recordAudioData = Param(parent='undefined', name='recordAudioData', doc='Whether to record audio data to a file location, for use only with m3u8 streams')
recordedFileNameCol = Param(parent='undefined', name='recordedFileNameCol', doc="Column holding file names to write audio data to if ``recordAudioData'' is set to true")
setAudioDataCol(value)[source]
Parameters

audioDataCol – Column holding audio data, must be either ByteArrays or Strings representing file URIs

setEndpointId(value)[source]
Parameters

endpointId – endpoint for custom speech models

setExtraFfmpegArgs(value)[source]
Parameters

extraFfmpegArgs – extra arguments to for ffmpeg output decoding

setFileType(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFileTypeCol(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFormat(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setLanguage(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Set the (keyword only) parameters

setParticipantsJson(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setParticipantsJsonCol(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setProfanity(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setRecordAudioData(value)[source]
Parameters

recordAudioData – Whether to record audio data to a file location, for use only with m3u8 streams

setRecordedFileNameCol(value)[source]
Parameters

recordedFileNameCol – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

setStreamIntermediateResults(value)[source]
Parameters

streamIntermediateResults – Whether or not to immediately return itermediate results, or group in a sequence

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setUrl(value)[source]
Parameters

url – Url of the service

streamIntermediateResults = Param(parent='undefined', name='streamIntermediateResults', doc='Whether or not to immediately return itermediate results, or group in a sequence')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DescribeImage module

class synapse.ml.cognitive.DescribeImage.DescribeImage(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DescribeImage_6418dfc660e0_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxCandidates=None, maxCandidatesCol=None, outputCol='DescribeImage_6418dfc660e0_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – Language of image description

  • maxCandidates (object) – Maximum candidate descriptions to return

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language of image description

Return type

language

getMaxCandidates()[source]
Returns

Maximum candidate descriptions to return

Return type

maxCandidates

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='Language of image description')
maxCandidates = Param(parent='undefined', name='maxCandidates', doc='Maximum candidate descriptions to return')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – Language of image description

setLanguageCol(value)[source]
Parameters

language – Language of image description

setLinkedService(value)[source]
setLocation(value)[source]
setMaxCandidates(value)[source]
Parameters

maxCandidates – Maximum candidate descriptions to return

setMaxCandidatesCol(value)[source]
Parameters

maxCandidates – Maximum candidate descriptions to return

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DescribeImage_6418dfc660e0_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxCandidates=None, maxCandidatesCol=None, outputCol='DescribeImage_6418dfc660e0_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.Detect module

class synapse.ml.cognitive.Detect.Detect(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='Detect_94af685c6a76_error', handler=None, outputCol='Detect_94af685c6a76_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='Detect_94af685c6a76_error', handler=None, outputCol='Detect_94af685c6a76_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
text = Param(parent='undefined', name='text', doc='the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectAnomalies module

class synapse.ml.cognitive.DetectAnomalies.DetectAnomalies(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_200b200eb1de_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_200b200eb1de_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (object) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (object) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc=' Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc=' Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc=' Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc=' Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc=' Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc=' Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectAnomalies_200b200eb1de_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectAnomalies_200b200eb1de_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectFace module

class synapse.ml.cognitive.DetectFace.DetectFace(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DetectFace_76d9693a80e9_error', handler=None, imageUrl=None, imageUrlCol=None, outputCol='DetectFace_76d9693a80e9_output', returnFaceAttributes=None, returnFaceAttributesCol=None, returnFaceId=None, returnFaceIdCol=None, returnFaceLandmarks=None, returnFaceLandmarksCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageUrl (object) – the url of the image to use

  • outputCol (object) – The name of the output column

  • returnFaceAttributes (object) – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

  • returnFaceId (object) – Return faceIds of the detected faces or not. The default value is true

  • returnFaceLandmarks (object) – Return face landmarks of the detected faces or not. The default value is false.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getReturnFaceAttributes()[source]
Returns

Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

Return type

returnFaceAttributes

getReturnFaceId()[source]
Returns

Return faceIds of the detected faces or not. The default value is true

Return type

returnFaceId

getReturnFaceLandmarks()[source]
Returns

Return face landmarks of the detected faces or not. The default value is false.

Return type

returnFaceLandmarks

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

returnFaceAttributes = Param(parent='undefined', name='returnFaceAttributes', doc='Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.')
returnFaceId = Param(parent='undefined', name='returnFaceId', doc='Return faceIds of the detected faces or not. The default value is true')
returnFaceLandmarks = Param(parent='undefined', name='returnFaceLandmarks', doc='Return face landmarks of the detected faces or not. The default value is false.')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DetectFace_76d9693a80e9_error', handler=None, imageUrl=None, imageUrlCol=None, outputCol='DetectFace_76d9693a80e9_output', returnFaceAttributes=None, returnFaceAttributesCol=None, returnFaceId=None, returnFaceIdCol=None, returnFaceLandmarks=None, returnFaceLandmarksCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setReturnFaceAttributes(value)[source]
Parameters

returnFaceAttributes – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceAttributesCol(value)[source]
Parameters

returnFaceAttributes – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceId(value)[source]
Parameters

returnFaceId – Return faceIds of the detected faces or not. The default value is true

setReturnFaceIdCol(value)[source]
Parameters

returnFaceId – Return faceIds of the detected faces or not. The default value is true

setReturnFaceLandmarks(value)[source]
Parameters

returnFaceLandmarks – Return face landmarks of the detected faces or not. The default value is false.

setReturnFaceLandmarksCol(value)[source]
Parameters

returnFaceLandmarks – Return face landmarks of the detected faces or not. The default value is false.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DetectLastAnomaly module

class synapse.ml.cognitive.DetectLastAnomaly.DetectLastAnomaly(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_b8965dda5a43_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_b8965dda5a43_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (object) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (object) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc=' Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

granularity = Param(parent='undefined', name='granularity', doc=' Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc=' Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc=' Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc=' Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc=' Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='DetectLastAnomaly_b8965dda5a43_error', granularity=None, granularityCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='DetectLastAnomaly_b8965dda5a43_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DictionaryExamples module

class synapse.ml.cognitive.DictionaryExamples.DictionaryExamples(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DictionaryExamples_1942097a4398_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryExamples_1942097a4398_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, textAndTranslation=None, textAndTranslationCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • fromLanguage (object) – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

  • handler (object) – Which strategy to use when handling requests

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • textAndTranslation (object) – A string specifying the translated text previously returned by the Dictionary lookup operation.

  • timeout (float) – number of seconds to wait before closing the connection

  • toLanguage (object) – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromLanguage = Param(parent='undefined', name='fromLanguage', doc='Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromLanguage()[source]
Returns

Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

Return type

fromLanguage

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getTextAndTranslation()[source]
Returns

A string specifying the translated text previously returned by the Dictionary lookup operation.

Return type

textAndTranslation

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToLanguage()[source]
Returns

Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

Return type

toLanguage

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromLanguage(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setFromLanguageCol(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DictionaryExamples_1942097a4398_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryExamples_1942097a4398_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, textAndTranslation=None, textAndTranslationCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setTextAndTranslation(value)[source]
Parameters

textAndTranslation – A string specifying the translated text previously returned by the Dictionary lookup operation.

setTextAndTranslationCol(value)[source]
Parameters

textAndTranslation – A string specifying the translated text previously returned by the Dictionary lookup operation.

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToLanguage(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setToLanguageCol(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
textAndTranslation = Param(parent='undefined', name='textAndTranslation', doc=' A string specifying the translated text previously returned by the Dictionary lookup operation.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toLanguage = Param(parent='undefined', name='toLanguage', doc='Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DictionaryLookup module

class synapse.ml.cognitive.DictionaryLookup.DictionaryLookup(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='DictionaryLookup_401f9f0b371b_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryLookup_401f9f0b371b_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • fromLanguage (object) – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

  • handler (object) – Which strategy to use when handling requests

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • toLanguage (object) – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromLanguage = Param(parent='undefined', name='fromLanguage', doc='Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromLanguage()[source]
Returns

Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

Return type

fromLanguage

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToLanguage()[source]
Returns

Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

Return type

toLanguage

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromLanguage(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setFromLanguageCol(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. The source language must be one of the supported languages included in the dictionary scope.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='DictionaryLookup_401f9f0b371b_error', fromLanguage=None, fromLanguageCol=None, handler=None, outputCol='DictionaryLookup_401f9f0b371b_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToLanguage(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setToLanguageCol(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
text = Param(parent='undefined', name='text', doc='the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toLanguage = Param(parent='undefined', name='toLanguage', doc='Specifies the language of the output text. The target language must be one of the supported languages included in the dictionary scope.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.DocumentTranslator module

class synapse.ml.cognitive.DocumentTranslator.DocumentTranslator(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='DocumentTranslator_94499a55d24f_error', filterPrefix=None, filterPrefixCol=None, filterSuffix=None, filterSuffixCol=None, maxPollingRetries=1000, outputCol='DocumentTranslator_94499a55d24f_output', pollingDelay=300, serviceName=None, sourceLanguage=None, sourceLanguageCol=None, sourceStorageSource=None, sourceStorageSourceCol=None, sourceUrl=None, sourceUrlCol=None, storageType=None, storageTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, targets=None, targetsCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • filterPrefix (object) – A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

  • filterSuffix (object) – A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • serviceName (object) –

  • sourceLanguage (object) – Language code. If none is specified, we will perform auto detect on the document.

  • sourceStorageSource (object) – Storage source of source input.

  • sourceUrl (object) – Location of the folder / container or single file with your documents.

  • storageType (object) – Storage type of the input documents source string. Required for single document translation only.

  • subscriptionKey (object) – the API key to use

  • targets (object) – Destination for the finished translated documents.

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
filterPrefix = Param(parent='undefined', name='filterPrefix', doc='A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.')
filterSuffix = Param(parent='undefined', name='filterSuffix', doc='A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFilterPrefix()[source]
Returns

A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

Return type

filterPrefix

getFilterSuffix()[source]
Returns

A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

Return type

filterSuffix

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getServiceName()[source]
Returns

Return type

serviceName

getSourceLanguage()[source]
Returns

Language code. If none is specified, we will perform auto detect on the document.

Return type

sourceLanguage

getSourceStorageSource()[source]
Returns

Storage source of source input.

Return type

sourceStorageSource

getSourceUrl()[source]
Returns

Location of the folder / container or single file with your documents.

Return type

sourceUrl

getStorageType()[source]
Returns

Storage type of the input documents source string. Required for single document translation only.

Return type

storageType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTargets()[source]
Returns

Destination for the finished translated documents.

Return type

targets

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

serviceName = Param(parent='undefined', name='serviceName', doc='')
setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFilterPrefix(value)[source]
Parameters

filterPrefix – A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

setFilterPrefixCol(value)[source]
Parameters

filterPrefix – A case-sensitive prefix string to filter documents in the source path for translation. For example, when using an Azure storage blob Uri, use the prefix to restrict sub folders for translation.

setFilterSuffix(value)[source]
Parameters

filterSuffix – A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

setFilterSuffixCol(value)[source]
Parameters

filterSuffix – A case-sensitive suffix string to filter documents in the source path for translation. This is most often use for file extensions.

setLinkedService(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='DocumentTranslator_94499a55d24f_error', filterPrefix=None, filterPrefixCol=None, filterSuffix=None, filterSuffixCol=None, maxPollingRetries=1000, outputCol='DocumentTranslator_94499a55d24f_output', pollingDelay=300, serviceName=None, sourceLanguage=None, sourceLanguageCol=None, sourceStorageSource=None, sourceStorageSourceCol=None, sourceUrl=None, sourceUrlCol=None, storageType=None, storageTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, targets=None, targetsCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setServiceName(value)[source]
Parameters

serviceName

setSourceLanguage(value)[source]
Parameters

sourceLanguage – Language code. If none is specified, we will perform auto detect on the document.

setSourceLanguageCol(value)[source]
Parameters

sourceLanguage – Language code. If none is specified, we will perform auto detect on the document.

setSourceStorageSource(value)[source]
Parameters

sourceStorageSource – Storage source of source input.

setSourceStorageSourceCol(value)[source]
Parameters

sourceStorageSource – Storage source of source input.

setSourceUrl(value)[source]
Parameters

sourceUrl – Location of the folder / container or single file with your documents.

setSourceUrlCol(value)[source]
Parameters

sourceUrl – Location of the folder / container or single file with your documents.

setStorageType(value)[source]
Parameters

storageType – Storage type of the input documents source string. Required for single document translation only.

setStorageTypeCol(value)[source]
Parameters

storageType – Storage type of the input documents source string. Required for single document translation only.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTargets(value)[source]
Parameters

targets – Destination for the finished translated documents.

setTargetsCol(value)[source]
Parameters

targets – Destination for the finished translated documents.

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

sourceLanguage = Param(parent='undefined', name='sourceLanguage', doc='Language code. If none is specified, we will perform auto detect on the document.')
sourceStorageSource = Param(parent='undefined', name='sourceStorageSource', doc='Storage source of source input.')
sourceUrl = Param(parent='undefined', name='sourceUrl', doc='Location of the folder / container or single file with your documents.')
storageType = Param(parent='undefined', name='storageType', doc='Storage type of the input documents source string. Required for single document translation only.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
targets = Param(parent='undefined', name='targets', doc='Destination for the finished translated documents.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.EntityDetector module

class synapse.ml.cognitive.EntityDetector.EntityDetector(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='EntityDetector_9555940e457c_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='EntityDetector_9555940e457c_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (object) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='EntityDetector_9555940e457c_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='EntityDetector_9555940e457c_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.EntityDetectorV2 module

class synapse.ml.cognitive.EntityDetectorV2.EntityDetectorV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='EntityDetectorV2_8afcc7527fa4_error', handler=None, language=None, languageCol=None, outputCol='EntityDetectorV2_8afcc7527fa4_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='EntityDetectorV2_8afcc7527fa4_error', handler=None, language=None, languageCol=None, outputCol='EntityDetectorV2_8afcc7527fa4_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.FindSimilarFace module

class synapse.ml.cognitive.FindSimilarFace.FindSimilarFace(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='FindSimilarFace_3b132647f6dc_error', faceId=None, faceIdCol=None, faceIds=None, faceIdsCol=None, faceListId=None, faceListIdCol=None, handler=None, largeFaceListId=None, largeFaceListIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, mode=None, modeCol=None, outputCol='FindSimilarFace_3b132647f6dc_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • faceId (object) – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

  • faceIds (object) – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • faceListId (object) – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • handler (object) – Which strategy to use when handling requests

  • largeFaceListId (object) – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

  • mode (object) – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceId = Param(parent='undefined', name='faceId', doc='faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.')
faceIds = Param(parent='undefined', name='faceIds', doc=' An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.')
faceListId = Param(parent='undefined', name='faceListId', doc=' An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceId()[source]
Returns

faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

Return type

faceId

getFaceIds()[source]
Returns

An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

faceIds

getFaceListId()[source]
Returns

An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

faceListId

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLargeFaceListId()[source]
Returns

An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

largeFaceListId

getMaxNumOfCandidatesReturned()[source]
Returns

Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

Return type

maxNumOfCandidatesReturned

getMode()[source]
Returns

Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

Return type

mode

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
largeFaceListId = Param(parent='undefined', name='largeFaceListId', doc=' An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.')
maxNumOfCandidatesReturned = Param(parent='undefined', name='maxNumOfCandidatesReturned', doc=' Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.')
mode = Param(parent='undefined', name='mode', doc=" Optional parameter. Similar face searching mode. It can be 'matchPerson' or 'matchFace'. It defaults to 'matchPerson'.")
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceId(value)[source]
Parameters

faceId – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

setFaceIdCol(value)[source]
Parameters

faceId – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

setFaceIds(value)[source]
Parameters

faceIds – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceIdsCol(value)[source]
Parameters

faceIds – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceListId(value)[source]
Parameters

faceListId – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceListIdCol(value)[source]
Parameters

faceListId – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLargeFaceListId(value)[source]
Parameters

largeFaceListId – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setLargeFaceListIdCol(value)[source]
Parameters

largeFaceListId – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxNumOfCandidatesReturned(value)[source]
Parameters

maxNumOfCandidatesReturned – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

setMaxNumOfCandidatesReturnedCol(value)[source]
Parameters

maxNumOfCandidatesReturned – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

setMode(value)[source]
Parameters

mode – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

setModeCol(value)[source]
Parameters

mode – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='FindSimilarFace_3b132647f6dc_error', faceId=None, faceIdCol=None, faceIds=None, faceIdsCol=None, faceListId=None, faceListIdCol=None, handler=None, largeFaceListId=None, largeFaceListIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, mode=None, modeCol=None, outputCol='FindSimilarFace_3b132647f6dc_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.FormOntologyLearner module

class synapse.ml.cognitive.FormOntologyLearner.FormOntologyLearner(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • inputCol (object) – The name of the input column

  • outputCol (object) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.cognitive.FormOntologyTransformer module

class synapse.ml.cognitive.FormOntologyTransformer.FormOntologyTransformer(java_obj=None, inputCol=None, ontology=None, outputCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • inputCol (object) – The name of the input column

  • ontology (object) – The ontology to cast values to

  • outputCol (object) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOntology()[source]
Returns

The ontology to cast values to

Return type

ontology

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
ontology = Param(parent='undefined', name='ontology', doc='The ontology to cast values to')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOntology(value)[source]
Parameters

ontology – The ontology to cast values to

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(inputCol=None, ontology=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.cognitive.GenerateThumbnails module

class synapse.ml.cognitive.GenerateThumbnails.GenerateThumbnails(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='GenerateThumbnails_516f8b91b638_error', handler=None, height=None, heightCol=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, outputCol='GenerateThumbnails_516f8b91b638_output', smartCropping=None, smartCroppingCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, width=None, widthCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • height (object) – the desired height of the image

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • outputCol (object) – The name of the output column

  • smartCropping (object) – whether to intelligently crop the image

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

  • width (object) – the desired width of the image

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getHeight()[source]
Returns

the desired height of the image

Return type

height

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSmartCropping()[source]
Returns

whether to intelligently crop the image

Return type

smartCropping

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

getWidth()[source]
Returns

the desired width of the image

Return type

width

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
height = Param(parent='undefined', name='height', doc='the desired height of the image')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setHeight(value)[source]
Parameters

height – the desired height of the image

setHeightCol(value)[source]
Parameters

height – the desired height of the image

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='GenerateThumbnails_516f8b91b638_error', handler=None, height=None, heightCol=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, outputCol='GenerateThumbnails_516f8b91b638_output', smartCropping=None, smartCroppingCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None, width=None, widthCol=None)[source]

Set the (keyword only) parameters

setSmartCropping(value)[source]
Parameters

smartCropping – whether to intelligently crop the image

setSmartCroppingCol(value)[source]
Parameters

smartCropping – whether to intelligently crop the image

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

setWidth(value)[source]
Parameters

width – the desired width of the image

setWidthCol(value)[source]
Parameters

width – the desired width of the image

smartCropping = Param(parent='undefined', name='smartCropping', doc='whether to intelligently crop the image')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')
width = Param(parent='undefined', name='width', doc='the desired width of the image')

synapse.ml.cognitive.GetCustomModel module

class synapse.ml.cognitive.GetCustomModel.GetCustomModel(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='GetCustomModel_527240c02013_error', handler=None, includeKeys=None, includeKeysCol=None, modelId=None, modelIdCol=None, outputCol='GetCustomModel_527240c02013_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • includeKeys (object) – Include list of extracted keys in model information.

  • modelId (object) – Model identifier.

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIncludeKeys()[source]
Returns

Include list of extracted keys in model information.

Return type

includeKeys

static getJavaPackage()[source]

Returns package name String.

getModelId()[source]
Returns

Model identifier.

Return type

modelId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
includeKeys = Param(parent='undefined', name='includeKeys', doc='Include list of extracted keys in model information.')
modelId = Param(parent='undefined', name='modelId', doc='Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIncludeKeys(value)[source]
Parameters

includeKeys – Include list of extracted keys in model information.

setIncludeKeysCol(value)[source]
Parameters

includeKeys – Include list of extracted keys in model information.

setLinkedService(value)[source]
setLocation(value)[source]
setModelId(value)[source]
Parameters

modelId – Model identifier.

setModelIdCol(value)[source]
Parameters

modelId – Model identifier.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='GetCustomModel_527240c02013_error', handler=None, includeKeys=None, includeKeysCol=None, modelId=None, modelIdCol=None, outputCol='GetCustomModel_527240c02013_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.GroupFaces module

class synapse.ml.cognitive.GroupFaces.GroupFaces(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='GroupFaces_2ebfa1532290_error', faceIds=None, faceIdsCol=None, handler=None, outputCol='GroupFaces_2ebfa1532290_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • faceIds (object) – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

  • handler (object) – Which strategy to use when handling requests

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceIds = Param(parent='undefined', name='faceIds', doc='Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceIds()[source]
Returns

Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

Return type

faceIds

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceIds(value)[source]
Parameters

faceIds – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

setFaceIdsCol(value)[source]
Parameters

faceIds – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='GroupFaces_2ebfa1532290_error', faceIds=None, faceIdsCol=None, handler=None, outputCol='GroupFaces_2ebfa1532290_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.IdentifyFaces module

class synapse.ml.cognitive.IdentifyFaces.IdentifyFaces(java_obj=None, concurrency=1, concurrentTimeout=None, confidenceThreshold=None, confidenceThresholdCol=None, errorCol='IdentifyFaces_adb21870ba92_error', faceIds=None, faceIdsCol=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, outputCol='IdentifyFaces_adb21870ba92_output', personGroupId=None, personGroupIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • confidenceThreshold (object) – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

  • errorCol (object) – column to hold http errors

  • faceIds (object) – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

  • handler (object) – Which strategy to use when handling requests

  • largePersonGroupId (object) – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

  • outputCol (object) – The name of the output column

  • personGroupId (object) – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
confidenceThreshold = Param(parent='undefined', name='confidenceThreshold', doc='Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceIds = Param(parent='undefined', name='faceIds', doc='Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10]. ')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getConfidenceThreshold()[source]
Returns

Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

Return type

confidenceThreshold

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceIds()[source]
Returns

Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

Return type

faceIds

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLargePersonGroupId()[source]
Returns

largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

largePersonGroupId

getMaxNumOfCandidatesReturned()[source]
Returns

The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

Return type

maxNumOfCandidatesReturned

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPersonGroupId()[source]
Returns

personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

personGroupId

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
largePersonGroupId = Param(parent='undefined', name='largePersonGroupId', doc='largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
maxNumOfCandidatesReturned = Param(parent='undefined', name='maxNumOfCandidatesReturned', doc='The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
personGroupId = Param(parent='undefined', name='personGroupId', doc='personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setConfidenceThreshold(value)[source]
Parameters

confidenceThreshold – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

setConfidenceThresholdCol(value)[source]
Parameters

confidenceThreshold – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceIds(value)[source]
Parameters

faceIds – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

setFaceIdsCol(value)[source]
Parameters

faceIds – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLargePersonGroupId(value)[source]
Parameters

largePersonGroupId – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLargePersonGroupIdCol(value)[source]
Parameters

largePersonGroupId – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxNumOfCandidatesReturned(value)[source]
Parameters

maxNumOfCandidatesReturned – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

setMaxNumOfCandidatesReturnedCol(value)[source]
Parameters

maxNumOfCandidatesReturned – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, confidenceThreshold=None, confidenceThresholdCol=None, errorCol='IdentifyFaces_adb21870ba92_error', faceIds=None, faceIdsCol=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, maxNumOfCandidatesReturned=None, maxNumOfCandidatesReturnedCol=None, outputCol='IdentifyFaces_adb21870ba92_output', personGroupId=None, personGroupIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPersonGroupId(value)[source]
Parameters

personGroupId – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonGroupIdCol(value)[source]
Parameters

personGroupId – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.KeyPhraseExtractor module

class synapse.ml.cognitive.KeyPhraseExtractor.KeyPhraseExtractor(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractor_4d1f2700218e_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='KeyPhraseExtractor_4d1f2700218e_output', showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (object) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractor_4d1f2700218e_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='KeyPhraseExtractor_4d1f2700218e_output', showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='if set to true, response will contain input and document level statistics.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.KeyPhraseExtractorV2 module

class synapse.ml.cognitive.KeyPhraseExtractorV2.KeyPhraseExtractorV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractorV2_4105fe2b3b07_error', handler=None, language=None, languageCol=None, outputCol='KeyPhraseExtractorV2_4105fe2b3b07_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='KeyPhraseExtractorV2_4105fe2b3b07_error', handler=None, language=None, languageCol=None, outputCol='KeyPhraseExtractorV2_4105fe2b3b07_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.LanguageDetector module

class synapse.ml.cognitive.LanguageDetector.LanguageDetector(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='LanguageDetector_9a89e32d8e2b_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='LanguageDetector_9a89e32d8e2b_output', showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (object) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='LanguageDetector_9a89e32d8e2b_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='LanguageDetector_9a89e32d8e2b_output', showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='if set to true, response will contain input and document level statistics.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.LanguageDetectorV2 module

class synapse.ml.cognitive.LanguageDetectorV2.LanguageDetectorV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='LanguageDetectorV2_b13686d08326_error', handler=None, language=None, languageCol=None, outputCol='LanguageDetectorV2_b13686d08326_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='LanguageDetectorV2_b13686d08326_error', handler=None, language=None, languageCol=None, outputCol='LanguageDetectorV2_b13686d08326_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.ListCustomModels module

class synapse.ml.cognitive.ListCustomModels.ListCustomModels(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='ListCustomModels_e7006bf90e98_error', handler=None, op=None, opCol=None, outputCol='ListCustomModels_e7006bf90e98_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • op (object) – Specify whether to return summary or full list of models.

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getOp()[source]
Returns

Specify whether to return summary or full list of models.

Return type

op

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
op = Param(parent='undefined', name='op', doc='Specify whether to return summary or full list of models.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOp(value)[source]
Parameters

op – Specify whether to return summary or full list of models.

setOpCol(value)[source]
Parameters

op – Specify whether to return summary or full list of models.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='ListCustomModels_e7006bf90e98_error', handler=None, op=None, opCol=None, outputCol='ListCustomModels_e7006bf90e98_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.NER module

class synapse.ml.cognitive.NER.NER(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='NER_8f7553b38eb5_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='NER_8f7553b38eb5_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (object) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='NER_8f7553b38eb5_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='NER_8f7553b38eb5_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.NERV2 module

class synapse.ml.cognitive.NERV2.NERV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='NERV2_737e951eaa22_error', handler=None, language=None, languageCol=None, outputCol='NERV2_737e951eaa22_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='NERV2_737e951eaa22_error', handler=None, language=None, languageCol=None, outputCol='NERV2_737e951eaa22_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.OCR module

class synapse.ml.cognitive.OCR.OCR(java_obj=None, concurrency=1, concurrentTimeout=None, detectOrientation=None, detectOrientationCol=None, errorCol='OCR_a56adf21dbf1_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='OCR_a56adf21dbf1_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • detectOrientation (object) – whether to detect image orientation prior to processing

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language to use

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
detectOrientation = Param(parent='undefined', name='detectOrientation', doc='whether to detect image orientation prior to processing')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDetectOrientation()[source]
Returns

whether to detect image orientation prior to processing

Return type

detectOrientation

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language to use

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='the language to use')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDetectOrientation(value)[source]
Parameters

detectOrientation – whether to detect image orientation prior to processing

setDetectOrientationCol(value)[source]
Parameters

detectOrientation – whether to detect image orientation prior to processing

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – the language to use

setLanguageCol(value)[source]
Parameters

language – the language to use

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, detectOrientation=None, detectOrientationCol=None, errorCol='OCR_a56adf21dbf1_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='OCR_a56adf21dbf1_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.PII module

class synapse.ml.cognitive.PII.PII(java_obj=None, concurrency=1, concurrentTimeout=None, domain=None, domainCol=None, errorCol='PII_860b945b98ad_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='PII_860b945b98ad_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • domain (object) – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • outputCol (object) – The name of the output column

  • piiCategories (object) – describes the PII categories to return

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
domain = Param(parent='undefined', name='domain', doc="if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: 'PHI', 'none'.")
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getDomain()[source]
Returns

if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

Return type

domain

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPiiCategories()[source]
Returns

describes the PII categories to return

Return type

piiCategories

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
piiCategories = Param(parent='undefined', name='piiCategories', doc='describes the PII categories to return')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setDomain(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setDomainCol(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, domain=None, domainCol=None, errorCol='PII_860b945b98ad_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='PII_860b945b98ad_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPiiCategories(value)[source]
Parameters

piiCategories – describes the PII categories to return

setPiiCategoriesCol(value)[source]
Parameters

piiCategories – describes the PII categories to return

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.ReadImage module

class synapse.ml.cognitive.ReadImage.ReadImage(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='ReadImage_ee0a3d11c90f_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxPollingRetries=1000, outputCol='ReadImage_ee0a3d11c90f_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (object) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLanguageCol(value)[source]
Parameters

language – IThe BCP-47 language code of the text in the document. Currently, only English (en), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt), and Spanish (es) are supported. Read supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='ReadImage_ee0a3d11c90f_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, maxPollingRetries=1000, outputCol='ReadImage_ee0a3d11c90f_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.RecognizeDomainSpecificContent module

class synapse.ml.cognitive.RecognizeDomainSpecificContent.RecognizeDomainSpecificContent(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='RecognizeDomainSpecificContent_ff840cfd4398_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, model=None, modelCol=None, outputCol='RecognizeDomainSpecificContent_ff840cfd4398_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • model (object) – the domain specific model: celebrities, landmarks

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getModel()[source]
Returns

the domain specific model: celebrities, landmarks

Return type

model

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
model = Param(parent='undefined', name='model', doc='the domain specific model: celebrities, landmarks')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setModel(value)[source]
Parameters

model – the domain specific model: celebrities, landmarks

setModelCol(value)[source]
Parameters

model – the domain specific model: celebrities, landmarks

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='RecognizeDomainSpecificContent_ff840cfd4398_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, model=None, modelCol=None, outputCol='RecognizeDomainSpecificContent_ff840cfd4398_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.RecognizeText module

class synapse.ml.cognitive.RecognizeText.RecognizeText(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='RecognizeText_930f597860ef_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, maxPollingRetries=1000, mode=None, modeCol=None, outputCol='RecognizeText_930f597860ef_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • maxPollingRetries (int) – number of times to poll

  • mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

  • outputCol (object) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getMode()[source]
Returns

If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

Return type

mode

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
mode = Param(parent='undefined', name='mode', doc="If this parameter is set to 'Printed', printed text recognition is performed. If 'Handwritten' is specified, handwriting recognition is performed")
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setMode(value)[source]
Parameters

mode – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

setModeCol(value)[source]
Parameters

mode – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='RecognizeText_930f597860ef_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, maxPollingRetries=1000, mode=None, modeCol=None, outputCol='RecognizeText_930f597860ef_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.SimpleDetectAnomalies module

class synapse.ml.cognitive.SimpleDetectAnomalies.SimpleDetectAnomalies(java_obj=None, concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='SimpleDetectAnomalies_ea50daf23243_error', granularity=None, granularityCol=None, groupbyCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='SimpleDetectAnomalies_ea50daf23243_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (object) – column to hold http errors

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • groupbyCol (object) – column that groups the series

  • handler (object) – Which strategy to use when handling requests

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (object) – The name of the output column

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • timestampCol (object) – column representing the time of the series

  • url (object) – Url of the service

  • valueCol (object) – column representing the value of the series

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customInterval = Param(parent='undefined', name='customInterval', doc=' Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes,  request can be set as granularity=minutely, customInterval=5.     ')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

customInterval

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

granularity

getGroupbyCol()[source]
Returns

column that groups the series

Return type

groupbyCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

maxAnomalyRatio

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

period

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

sensitivity

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

series

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getTimestampCol()[source]
Returns

column representing the time of the series

Return type

timestampCol

getUrl()[source]
Returns

Url of the service

Return type

url

getValueCol()[source]
Returns

column representing the value of the series

Return type

valueCol

granularity = Param(parent='undefined', name='granularity', doc=' Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.     ')
groupbyCol = Param(parent='undefined', name='groupbyCol', doc='column that groups the series')
handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
maxAnomalyRatio = Param(parent='undefined', name='maxAnomalyRatio', doc=' Optional argument, advanced model parameter, max anomaly ratio in a time series.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
period = Param(parent='undefined', name='period', doc=' Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitivity = Param(parent='undefined', name='sensitivity', doc=' Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted     ')
series = Param(parent='undefined', name='series', doc=' Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.     ')
setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomInterval(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setGranularity(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGroupbyCol(value)[source]
Parameters

groupbyCol – column that groups the series

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, customInterval=None, customIntervalCol=None, errorCol='SimpleDetectAnomalies_ea50daf23243_error', granularity=None, granularityCol=None, groupbyCol=None, handler=None, maxAnomalyRatio=None, maxAnomalyRatioCol=None, outputCol='SimpleDetectAnomalies_ea50daf23243_output', period=None, periodCol=None, sensitivity=None, sensitivityCol=None, series=None, seriesCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Set the (keyword only) parameters

setPeriod(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setTimestampCol(value)[source]
Parameters

timestampCol – column representing the time of the series

setUrl(value)[source]
Parameters

url – Url of the service

setValueCol(value)[source]
Parameters

valueCol – column representing the value of the series

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
timestampCol = Param(parent='undefined', name='timestampCol', doc='column representing the time of the series')
url = Param(parent='undefined', name='url', doc='Url of the service')
valueCol = Param(parent='undefined', name='valueCol', doc='column representing the value of the series')

synapse.ml.cognitive.SpeechToText module

class synapse.ml.cognitive.SpeechToText.SpeechToText(java_obj=None, audioData=None, audioDataCol=None, concurrency=1, concurrentTimeout=None, errorCol='SpeechToText_8b2fd21f0b36_error', format=None, formatCol=None, handler=None, language=None, languageCol=None, outputCol='SpeechToText_8b2fd21f0b36_output', profanity=None, profanityCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioData (object) – The data sent to the service must be a .wav files

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • handler (object) – Which strategy to use when handling requests

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (object) – The name of the output column

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

audioData = Param(parent='undefined', name='audioData', doc=' The data sent to the service must be a .wav files     ')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
format = Param(parent='undefined', name='format', doc=' Specifies the result format. Accepted values are simple and detailed. Default is simple.     ')
getAudioData()[source]
Returns

The data sent to the service must be a .wav files

Return type

audioData

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

format

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

profanity

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc=' Identifies the spoken language that is being recognized.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
profanity = Param(parent='undefined', name='profanity', doc=' Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

setAudioData(value)[source]
Parameters

audioData – The data sent to the service must be a .wav files

setAudioDataCol(value)[source]
Parameters

audioData – The data sent to the service must be a .wav files

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFormat(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(audioData=None, audioDataCol=None, concurrency=1, concurrentTimeout=None, errorCol='SpeechToText_8b2fd21f0b36_error', format=None, formatCol=None, handler=None, language=None, languageCol=None, outputCol='SpeechToText_8b2fd21f0b36_output', profanity=None, profanityCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setProfanity(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.SpeechToTextSDK module

class synapse.ml.cognitive.SpeechToTextSDK.SpeechToTextSDK(java_obj=None, audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioDataCol (object) – Column holding audio data, must be either ByteArrays or Strings representing file URIs

  • endpointId (object) – endpoint for custom speech models

  • extraFfmpegArgs (list) – extra arguments to for ffmpeg output decoding

  • fileType (object) – The file type of the sound files, supported types: wav, ogg, mp3

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (object) – The name of the output column

  • participantsJson (object) – a json representation of a list of conversation participants (email, language, user)

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • recordAudioData (bool) – Whether to record audio data to a file location, for use only with m3u8 streams

  • recordedFileNameCol (object) – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

  • streamIntermediateResults (bool) – Whether or not to immediately return itermediate results, or group in a sequence

  • subscriptionKey (object) – the API key to use

  • url (object) – Url of the service

audioDataCol = Param(parent='undefined', name='audioDataCol', doc='Column holding audio data, must be either ByteArrays or Strings representing file URIs')
endpointId = Param(parent='undefined', name='endpointId', doc='endpoint for custom speech models')
extraFfmpegArgs = Param(parent='undefined', name='extraFfmpegArgs', doc='extra arguments to for ffmpeg output decoding')
fileType = Param(parent='undefined', name='fileType', doc='The file type of the sound files, supported types: wav, ogg, mp3')
format = Param(parent='undefined', name='format', doc=' Specifies the result format. Accepted values are simple and detailed. Default is simple.     ')
getAudioDataCol()[source]
Returns

Column holding audio data, must be either ByteArrays or Strings representing file URIs

Return type

audioDataCol

getEndpointId()[source]
Returns

endpoint for custom speech models

Return type

endpointId

getExtraFfmpegArgs()[source]
Returns

extra arguments to for ffmpeg output decoding

Return type

extraFfmpegArgs

getFileType()[source]
Returns

The file type of the sound files, supported types: wav, ogg, mp3

Return type

fileType

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

format

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getParticipantsJson()[source]
Returns

a json representation of a list of conversation participants (email, language, user)

Return type

participantsJson

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

profanity

getRecordAudioData()[source]
Returns

Whether to record audio data to a file location, for use only with m3u8 streams

Return type

recordAudioData

getRecordedFileNameCol()[source]
Returns

Column holding file names to write audio data to if ``recordAudioData’’ is set to true

Return type

recordedFileNameCol

getStreamIntermediateResults()[source]
Returns

Whether or not to immediately return itermediate results, or group in a sequence

Return type

streamIntermediateResults

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getUrl()[source]
Returns

Url of the service

Return type

url

language = Param(parent='undefined', name='language', doc=' Identifies the spoken language that is being recognized.     ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
participantsJson = Param(parent='undefined', name='participantsJson', doc='a json representation of a list of conversation participants (email, language, user)')
profanity = Param(parent='undefined', name='profanity', doc=' Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.     ')
classmethod read()[source]

Returns an MLReader instance for this class.

recordAudioData = Param(parent='undefined', name='recordAudioData', doc='Whether to record audio data to a file location, for use only with m3u8 streams')
recordedFileNameCol = Param(parent='undefined', name='recordedFileNameCol', doc="Column holding file names to write audio data to if ``recordAudioData'' is set to true")
setAudioDataCol(value)[source]
Parameters

audioDataCol – Column holding audio data, must be either ByteArrays or Strings representing file URIs

setEndpointId(value)[source]
Parameters

endpointId – endpoint for custom speech models

setExtraFfmpegArgs(value)[source]
Parameters

extraFfmpegArgs – extra arguments to for ffmpeg output decoding

setFileType(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFileTypeCol(value)[source]
Parameters

fileType – The file type of the sound files, supported types: wav, ogg, mp3

setFormat(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setLanguage(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language – Identifies the spoken language that is being recognized.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(audioDataCol=None, endpointId=None, extraFfmpegArgs=[], fileType=None, fileTypeCol=None, format=None, formatCol=None, language=None, languageCol=None, outputCol=None, participantsJson=None, participantsJsonCol=None, profanity=None, profanityCol=None, recordAudioData=False, recordedFileNameCol=None, streamIntermediateResults=True, subscriptionKey=None, subscriptionKeyCol=None, url=None)[source]

Set the (keyword only) parameters

setParticipantsJson(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setParticipantsJsonCol(value)[source]
Parameters

participantsJson – a json representation of a list of conversation participants (email, language, user)

setProfanity(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setRecordAudioData(value)[source]
Parameters

recordAudioData – Whether to record audio data to a file location, for use only with m3u8 streams

setRecordedFileNameCol(value)[source]
Parameters

recordedFileNameCol – Column holding file names to write audio data to if ``recordAudioData’’ is set to true

setStreamIntermediateResults(value)[source]
Parameters

streamIntermediateResults – Whether or not to immediately return itermediate results, or group in a sequence

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setUrl(value)[source]
Parameters

url – Url of the service

streamIntermediateResults = Param(parent='undefined', name='streamIntermediateResults', doc='Whether or not to immediately return itermediate results, or group in a sequence')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TagImage module

class synapse.ml.cognitive.TagImage.TagImage(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='TagImage_6dc9b2588e33_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='TagImage_6dc9b2588e33_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – The desired language for output generation.

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

imageBytes

getImageUrl()[source]
Returns

the url of the image to use

Return type

imageUrl

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

The desired language for output generation.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
imageBytes = Param(parent='undefined', name='imageBytes', doc='bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='the url of the image to use')
language = Param(parent='undefined', name='language', doc='The desired language for output generation.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setImageBytes(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl – the url of the image to use

setLanguage(value)[source]
Parameters

language – The desired language for output generation.

setLanguageCol(value)[source]
Parameters

language – The desired language for output generation.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='TagImage_6dc9b2588e33_error', handler=None, imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, language=None, languageCol=None, outputCol='TagImage_6dc9b2588e33_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TextSentiment module

class synapse.ml.cognitive.TextSentiment.TextSentiment(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='TextSentiment_71e3ce683c70_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='TextSentiment_71e3ce683c70_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

  • opinionMining (object) – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

  • outputCol (object) – The name of the output column

  • showStats (object) – if set to true, response will contain input and document level statistics.

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

Return type

modelVersion

getOpinionMining()[source]
Returns

if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

Return type

opinionMining

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

if set to true, response will contain input and document level statistics.

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.')
opinionMining = Param(parent='undefined', name='opinionMining', doc='if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setModelVersionCol(value)[source]
Parameters

modelVersion – This value indicates which model will be used for scoring. If a model-version is not specified, the API should default to the latest, non-preview version.

setOpinionMining(value)[source]
Parameters

opinionMining – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

setOpinionMiningCol(value)[source]
Parameters

opinionMining – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='TextSentiment_71e3ce683c70_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='TextSentiment_71e3ce683c70_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setShowStatsCol(value)[source]
Parameters

showStats – if set to true, response will contain input and document level statistics.

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='if set to true, response will contain input and document level statistics.')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.TextSentimentV2 module

class synapse.ml.cognitive.TextSentimentV2.TextSentimentV2(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='TextSentimentV2_1a77b3dc7ee7_error', handler=None, language=None, languageCol=None, outputCol='TextSentimentV2_1a77b3dc7ee7_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='the language code of the text (optional for some services)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='TextSentimentV2_1a77b3dc7ee7_error', handler=None, language=None, languageCol=None, outputCol='TextSentimentV2_1a77b3dc7ee7_output', subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
text = Param(parent='undefined', name='text', doc='the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.Translate module

class synapse.ml.cognitive.Translate.Translate(java_obj=None, allowFallback=None, allowFallbackCol=None, category=None, categoryCol=None, concurrency=1, concurrentTimeout=None, errorCol='Translate_e4ba15333d36_error', fromLanguage=None, fromLanguageCol=None, fromScript=None, fromScriptCol=None, handler=None, includeAlignment=None, includeAlignmentCol=None, includeSentenceLength=None, includeSentenceLengthCol=None, outputCol='Translate_e4ba15333d36_output', profanityAction=None, profanityActionCol=None, profanityMarker=None, profanityMarkerCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, suggestedFrom=None, suggestedFromCol=None, text=None, textCol=None, textType=None, textTypeCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, toScript=None, toScriptCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • allowFallback (object) – Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

  • category (object) – A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • fromLanguage (object) – Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

  • fromScript (object) – Specifies the script of the input text.

  • handler (object) – Which strategy to use when handling requests

  • includeAlignment (object) – Specifies whether to include alignment projection from source text to translated text.

  • includeSentenceLength (object) – Specifies whether to include sentence boundaries for the input text and the translated text.

  • outputCol (object) – The name of the output column

  • profanityAction (object) – Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

  • profanityMarker (object) – Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • suggestedFrom (object) – Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

  • text (object) – the string to translate

  • textType (object) – Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

  • timeout (float) – number of seconds to wait before closing the connection

  • toLanguage (object) – Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de&to=it to translate to German and Italian.

  • toScript (object) – Specifies the script of the translated text.

  • url (object) – Url of the service

allowFallback = Param(parent='undefined', name='allowFallback', doc='Specifies that the service is allowed to fall back to a general system when a custom system does not exist. ')
category = Param(parent='undefined', name='category', doc='A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromLanguage = Param(parent='undefined', name='fromLanguage', doc='Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.')
fromScript = Param(parent='undefined', name='fromScript', doc='Specifies the script of the input text.')
getAllowFallback()[source]
Returns

Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

Return type

allowFallback

getCategory()[source]
Returns

A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

Return type

category

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromLanguage()[source]
Returns

Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

Return type

fromLanguage

getFromScript()[source]
Returns

Specifies the script of the input text.

Return type

fromScript

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getIncludeAlignment()[source]
Returns

Specifies whether to include alignment projection from source text to translated text.

Return type

includeAlignment

getIncludeSentenceLength()[source]
Returns

Specifies whether to include sentence boundaries for the input text and the translated text.

Return type

includeSentenceLength

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getProfanityAction()[source]
Returns

Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

Return type

profanityAction

getProfanityMarker()[source]
Returns

Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

Return type

profanityMarker

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getSuggestedFrom()[source]
Returns

Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

Return type

suggestedFrom

getText()[source]
Returns

the string to translate

Return type

text

getTextType()[source]
Returns

Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

Return type

textType

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToLanguage()[source]
Returns

Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de&to=it to translate to German and Italian.

Return type

toLanguage

getToScript()[source]
Returns

Specifies the script of the translated text.

Return type

toScript

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
includeAlignment = Param(parent='undefined', name='includeAlignment', doc='Specifies whether to include alignment projection from source text to translated text.')
includeSentenceLength = Param(parent='undefined', name='includeSentenceLength', doc='Specifies whether to include sentence boundaries for the input text and the translated text. ')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
profanityAction = Param(parent='undefined', name='profanityAction', doc='Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted. ')
profanityMarker = Param(parent='undefined', name='profanityMarker', doc='Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.')
classmethod read()[source]

Returns an MLReader instance for this class.

setAllowFallback(value)[source]
Parameters

allowFallback – Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

setAllowFallbackCol(value)[source]
Parameters

allowFallback – Specifies that the service is allowed to fall back to a general system when a custom system does not exist.

setCategory(value)[source]
Parameters

category – A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

setCategoryCol(value)[source]
Parameters

category – A string specifying the category (domain) of the translation. This parameter is used to get translations from a customized system built with Custom Translator. Add the Category ID from your Custom Translator project details to this parameter to use your deployed customized system. Default value is: general.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromLanguage(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

setFromLanguageCol(value)[source]
Parameters

fromLanguage – Specifies the language of the input text. Find which languages are available to translate from by looking up supported languages using the translation scope. If the from parameter is not specified, automatic language detection is applied to determine the source language. You must use the from parameter rather than autodetection when using the dynamic dictionary feature.

setFromScript(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setFromScriptCol(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setIncludeAlignment(value)[source]
Parameters

includeAlignment – Specifies whether to include alignment projection from source text to translated text.

setIncludeAlignmentCol(value)[source]
Parameters

includeAlignment – Specifies whether to include alignment projection from source text to translated text.

setIncludeSentenceLength(value)[source]
Parameters

includeSentenceLength – Specifies whether to include sentence boundaries for the input text and the translated text.

setIncludeSentenceLengthCol(value)[source]
Parameters

includeSentenceLength – Specifies whether to include sentence boundaries for the input text and the translated text.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(allowFallback=None, allowFallbackCol=None, category=None, categoryCol=None, concurrency=1, concurrentTimeout=None, errorCol='Translate_e4ba15333d36_error', fromLanguage=None, fromLanguageCol=None, fromScript=None, fromScriptCol=None, handler=None, includeAlignment=None, includeAlignmentCol=None, includeSentenceLength=None, includeSentenceLengthCol=None, outputCol='Translate_e4ba15333d36_output', profanityAction=None, profanityActionCol=None, profanityMarker=None, profanityMarkerCol=None, subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, suggestedFrom=None, suggestedFromCol=None, text=None, textCol=None, textType=None, textTypeCol=None, timeout=60.0, toLanguage=None, toLanguageCol=None, toScript=None, toScriptCol=None, url=None)[source]

Set the (keyword only) parameters

setProfanityAction(value)[source]
Parameters

profanityAction – Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

setProfanityActionCol(value)[source]
Parameters

profanityAction – Specifies how profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted.

setProfanityMarker(value)[source]
Parameters

profanityMarker – Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

setProfanityMarkerCol(value)[source]
Parameters

profanityMarker – Specifies how profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setSuggestedFrom(value)[source]
Parameters

suggestedFrom – Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

setSuggestedFromCol(value)[source]
Parameters

suggestedFrom – Specifies a fallback language if the language of the input text can’t be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTextType(value)[source]
Parameters

textType – Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

setTextTypeCol(value)[source]
Parameters

textType – Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToLanguage(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de&to=it to translate to German and Italian.

setToLanguageCol(value)[source]
Parameters

toLanguage – Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It’s possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de&to=it to translate to German and Italian.

setToScript(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setToScriptCol(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
suggestedFrom = Param(parent='undefined', name='suggestedFrom', doc="Specifies a fallback language if the language of the input text can't be identified. Language autodetection is applied when the from parameter is omitted. If detection fails, the suggestedFrom language will be assumed.")
text = Param(parent='undefined', name='text', doc='the string to translate')
textType = Param(parent='undefined', name='textType', doc='Defines whether the text being translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values are: plain (default) or html.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toLanguage = Param(parent='undefined', name='toLanguage', doc="Specifies the language of the output text. The target language must be one of the supported languages included in the translation scope. For example, use to=de to translate to German. It's possible to translate to multiple languages simultaneously by repeating the parameter in the query string. For example, use to=de&to=it to translate to German and Italian.")
toScript = Param(parent='undefined', name='toScript', doc='Specifies the script of the translated text.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.Transliterate module

class synapse.ml.cognitive.Transliterate.Transliterate(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='Transliterate_af2462fbb766_error', fromScript=None, fromScriptCol=None, handler=None, language=None, languageCol=None, outputCol='Transliterate_af2462fbb766_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toScript=None, toScriptCol=None, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • fromScript (object) – Specifies the script of the input text.

  • handler (object) – Which strategy to use when handling requests

  • language (object) – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

  • outputCol (object) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • subscriptionRegion (object) – the API region to use

  • text (object) – the string to translate

  • timeout (float) – number of seconds to wait before closing the connection

  • toScript (object) – Specifies the script of the translated text.

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
fromScript = Param(parent='undefined', name='fromScript', doc='Specifies the script of the input text.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFromScript()[source]
Returns

Specifies the script of the input text.

Return type

fromScript

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

Return type

language

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSubscriptionRegion()[source]
Returns

the API region to use

Return type

subscriptionRegion

getText()[source]
Returns

the string to translate

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getToScript()[source]
Returns

Specifies the script of the translated text.

Return type

toScript

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFromScript(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setFromScriptCol(value)[source]
Parameters

fromScript – Specifies the script of the input text.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLanguageCol(value)[source]
Parameters

language – Language tag identifying the language of the input text. If a code is not specified, automatic language detection will be applied.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='Transliterate_af2462fbb766_error', fromScript=None, fromScriptCol=None, handler=None, language=None, languageCol=None, outputCol='Transliterate_af2462fbb766_output', subscriptionKey=None, subscriptionKeyCol=None, subscriptionRegion=None, subscriptionRegionCol=None, text=None, textCol=None, timeout=60.0, toScript=None, toScriptCol=None, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionRegion(value)[source]
Parameters

subscriptionRegion – the API region to use

setSubscriptionRegionCol(value)[source]
Parameters

subscriptionRegion – the API region to use

setText(value)[source]
Parameters

text – the string to translate

setTextCol(value)[source]
Parameters

text – the string to translate

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setToScript(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setToScriptCol(value)[source]
Parameters

toScript – Specifies the script of the translated text.

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
subscriptionRegion = Param(parent='undefined', name='subscriptionRegion', doc='the API region to use')
text = Param(parent='undefined', name='text', doc='the string to translate')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
toScript = Param(parent='undefined', name='toScript', doc='Specifies the script of the translated text.')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.VerifyFaces module

class synapse.ml.cognitive.VerifyFaces.VerifyFaces(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='VerifyFaces_db2641d91c47_error', faceId=None, faceIdCol=None, faceId1=None, faceId1Col=None, faceId2=None, faceId2Col=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, outputCol='VerifyFaces_db2641d91c47_output', personGroupId=None, personGroupIdCol=None, personId=None, personIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (object) – column to hold http errors

  • faceId (object) – faceId of the face, comes from Face - Detect.

  • faceId1 (object) – faceId of one face, comes from Face - Detect.

  • faceId2 (object) – faceId of another face, comes from Face - Detect.

  • handler (object) – Which strategy to use when handling requests

  • largePersonGroupId (object) – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • outputCol (object) – The name of the output column

  • personGroupId (object) – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • personId (object) – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (object) – Url of the service

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
faceId = Param(parent='undefined', name='faceId', doc='faceId of the face, comes from Face - Detect.')
faceId1 = Param(parent='undefined', name='faceId1', doc='faceId of one face, comes from Face - Detect.')
faceId2 = Param(parent='undefined', name='faceId2', doc='faceId of another face, comes from Face - Detect.')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getFaceId()[source]
Returns

faceId of the face, comes from Face - Detect.

Return type

faceId

getFaceId1()[source]
Returns

faceId of one face, comes from Face - Detect.

Return type

faceId1

getFaceId2()[source]
Returns

faceId of another face, comes from Face - Detect.

Return type

faceId2

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLargePersonGroupId()[source]
Returns

Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

largePersonGroupId

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPersonGroupId()[source]
Returns

Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

personGroupId

getPersonId()[source]
Returns

Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

Return type

personId

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
largePersonGroupId = Param(parent='undefined', name='largePersonGroupId', doc='Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
personGroupId = Param(parent='undefined', name='personGroupId', doc='Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.')
personId = Param(parent='undefined', name='personId', doc='Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setFaceId(value)[source]
Parameters

faceId – faceId of the face, comes from Face - Detect.

setFaceId1(value)[source]
Parameters

faceId1 – faceId of one face, comes from Face - Detect.

setFaceId1Col(value)[source]
Parameters

faceId1 – faceId of one face, comes from Face - Detect.

setFaceId2(value)[source]
Parameters

faceId2 – faceId of another face, comes from Face - Detect.

setFaceId2Col(value)[source]
Parameters

faceId2 – faceId of another face, comes from Face - Detect.

setFaceIdCol(value)[source]
Parameters

faceId – faceId of the face, comes from Face - Detect.

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLargePersonGroupId(value)[source]
Parameters

largePersonGroupId – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLargePersonGroupIdCol(value)[source]
Parameters

largePersonGroupId – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLinkedService(value)[source]
setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, errorCol='VerifyFaces_db2641d91c47_error', faceId=None, faceIdCol=None, faceId1=None, faceId1Col=None, faceId2=None, faceId2Col=None, handler=None, largePersonGroupId=None, largePersonGroupIdCol=None, outputCol='VerifyFaces_db2641d91c47_output', personGroupId=None, personGroupIdCol=None, personId=None, personIdCol=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPersonGroupId(value)[source]
Parameters

personGroupId – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonGroupIdCol(value)[source]
Parameters

personGroupId – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonId(value)[source]
Parameters

personId – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

setPersonIdCol(value)[source]
Parameters

personId – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.