synapse.ml.services.form package

Submodules

synapse.ml.services.form.AnalyzeBusinessCards module

class synapse.ml.services.form.AnalyzeBusinessCards.AnalyzeBusinessCards(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_1cd0503c1e21_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeBusinessCards_1cd0503c1e21_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getBackoffs()[source]
Returns:

array of backoffs to use in the handler

Return type:

backoffs

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getImageBytes()[source]
Returns:

bytestream of the image to use

Return type:

imageBytes

getImageUrl()[source]
Returns:

the url of the image to use

Return type:

imageUrl

getIncludeTextDetails()[source]
Returns:

Include text lines and element references in the result.

Return type:

includeTextDetails

getInitialPollingDelay()[source]
Returns:

number of milliseconds to wait before first poll for result

Return type:

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns:

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type:

locale

getMaxPollingRetries()[source]
Returns:

number of times to poll

Return type:

maxPollingRetries

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPages()[source]
Returns:

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type:

pages

getPollingDelay()[source]
Returns:

number of milliseconds to wait between polling

Return type:

pollingDelay

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns:

set true to suppress the maxumimum retries exception and report in the error column

Return type:

suppressMaxRetriesException

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters:

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters:

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters:

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters:

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters:

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setPages(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeBusinessCards_1cd0503c1e21_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeBusinessCards_1cd0503c1e21_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters:

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters:

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.AnalyzeCustomModel module

class synapse.ml.services.form.AnalyzeCustomModel.AnalyzeCustomModel(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_e69ae34e5423_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelId=None, modelIdCol=None, outputCol='AnalyzeCustomModel_e69ae34e5423_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • modelId (object) – Model identifier.

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getBackoffs()[source]
Returns:

array of backoffs to use in the handler

Return type:

backoffs

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getImageBytes()[source]
Returns:

bytestream of the image to use

Return type:

imageBytes

getImageUrl()[source]
Returns:

the url of the image to use

Return type:

imageUrl

getIncludeTextDetails()[source]
Returns:

Include text lines and element references in the result.

Return type:

includeTextDetails

getInitialPollingDelay()[source]
Returns:

number of milliseconds to wait before first poll for result

Return type:

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns:

number of times to poll

Return type:

maxPollingRetries

getModelId()[source]
Returns:

Model identifier.

Return type:

modelId

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPollingDelay()[source]
Returns:

number of milliseconds to wait between polling

Return type:

pollingDelay

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns:

set true to suppress the maxumimum retries exception and report in the error column

Return type:

suppressMaxRetriesException

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelId = Param(parent='undefined', name='modelId', doc='ServiceParam: Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters:

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters:

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters:

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters:

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters:

maxPollingRetries – number of times to poll

setModelId(value)[source]
Parameters:

modelId – Model identifier.

setModelIdCol(value)[source]
Parameters:

modelId – Model identifier.

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeCustomModel_e69ae34e5423_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, modelId=None, modelIdCol=None, outputCol='AnalyzeCustomModel_e69ae34e5423_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters:

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters:

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.AnalyzeDocument module

class synapse.ml.services.form.AnalyzeDocument.AnalyzeDocument(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, apiVersion=None, apiVersionCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeDocument_0b4d0c2db02d_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeDocument_0b4d0c2db02d_output', pages=None, pagesCol=None, pollingDelay=300, prebuiltModelId=None, prebuiltModelIdCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • apiVersion (object) – version of the api

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • prebuiltModelId (object) – Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

  • stringIndexType (object) – Method used to compute string offset and length.

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
apiVersion = Param(parent='undefined', name='apiVersion', doc='ServiceParam: version of the api')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getApiVersion()[source]
Returns:

version of the api

Return type:

apiVersion

getBackoffs()[source]
Returns:

array of backoffs to use in the handler

Return type:

backoffs

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getImageBytes()[source]
Returns:

bytestream of the image to use

Return type:

imageBytes

getImageUrl()[source]
Returns:

the url of the image to use

Return type:

imageUrl

getInitialPollingDelay()[source]
Returns:

number of milliseconds to wait before first poll for result

Return type:

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns:

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type:

locale

getMaxPollingRetries()[source]
Returns:

number of times to poll

Return type:

maxPollingRetries

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPages()[source]
Returns:

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type:

pages

getPollingDelay()[source]
Returns:

number of milliseconds to wait between polling

Return type:

pollingDelay

getPrebuiltModelId()[source]
Returns:

Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

Return type:

prebuiltModelId

getStringIndexType()[source]
Returns:

Method used to compute string offset and length.

Return type:

stringIndexType

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns:

set true to suppress the maxumimum retries exception and report in the error column

Return type:

suppressMaxRetriesException

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
prebuiltModelId = Param(parent='undefined', name='prebuiltModelId', doc='ServiceParam: Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setApiVersion(value)[source]
Parameters:

apiVersion – version of the api

setApiVersionCol(value)[source]
Parameters:

apiVersion – version of the api

setBackoffs(value)[source]
Parameters:

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters:

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters:

imageUrl – the url of the image to use

setInitialPollingDelay(value)[source]
Parameters:

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters:

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setPages(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, apiVersion=None, apiVersionCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeDocument_0b4d0c2db02d_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeDocument_0b4d0c2db02d_output', pages=None, pagesCol=None, pollingDelay=300, prebuiltModelId=None, prebuiltModelIdCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters:

pollingDelay – number of milliseconds to wait between polling

setPrebuiltModelId(value)[source]
Parameters:

prebuiltModelId – Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

setPrebuiltModelIdCol(value)[source]
Parameters:

prebuiltModelId – Prebuilt Model identifier for Form Recognizer V3.0, supported modelId: prebuilt-read, prebuilt-layout,prebuilt-document, prebuilt-businessCard, prebuilt-idDocument, prebuilt-invoice, prebuilt-receipt,or your custom modelId

setStringIndexType(value)[source]
Parameters:

stringIndexType – Method used to compute string offset and length.

setStringIndexTypeCol(value)[source]
Parameters:

stringIndexType – Method used to compute string offset and length.

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters:

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Method used to compute string offset and length.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.AnalyzeIDDocuments module

class synapse.ml.services.form.AnalyzeIDDocuments.AnalyzeIDDocuments(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_96ad7efb318b_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, outputCol='AnalyzeIDDocuments_96ad7efb318b_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getBackoffs()[source]
Returns:

array of backoffs to use in the handler

Return type:

backoffs

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getImageBytes()[source]
Returns:

bytestream of the image to use

Return type:

imageBytes

getImageUrl()[source]
Returns:

the url of the image to use

Return type:

imageUrl

getIncludeTextDetails()[source]
Returns:

Include text lines and element references in the result.

Return type:

includeTextDetails

getInitialPollingDelay()[source]
Returns:

number of milliseconds to wait before first poll for result

Return type:

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns:

number of times to poll

Return type:

maxPollingRetries

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPages()[source]
Returns:

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type:

pages

getPollingDelay()[source]
Returns:

number of milliseconds to wait between polling

Return type:

pollingDelay

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns:

set true to suppress the maxumimum retries exception and report in the error column

Return type:

suppressMaxRetriesException

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters:

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters:

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters:

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters:

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters:

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setPages(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeIDDocuments_96ad7efb318b_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, maxPollingRetries=1000, outputCol='AnalyzeIDDocuments_96ad7efb318b_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters:

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters:

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.AnalyzeInvoices module

class synapse.ml.services.form.AnalyzeInvoices.AnalyzeInvoices(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_87b50f73e5d3_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeInvoices_87b50f73e5d3_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getBackoffs()[source]
Returns:

array of backoffs to use in the handler

Return type:

backoffs

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getImageBytes()[source]
Returns:

bytestream of the image to use

Return type:

imageBytes

getImageUrl()[source]
Returns:

the url of the image to use

Return type:

imageUrl

getIncludeTextDetails()[source]
Returns:

Include text lines and element references in the result.

Return type:

includeTextDetails

getInitialPollingDelay()[source]
Returns:

number of milliseconds to wait before first poll for result

Return type:

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns:

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type:

locale

getMaxPollingRetries()[source]
Returns:

number of times to poll

Return type:

maxPollingRetries

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPages()[source]
Returns:

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type:

pages

getPollingDelay()[source]
Returns:

number of milliseconds to wait between polling

Return type:

pollingDelay

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns:

set true to suppress the maxumimum retries exception and report in the error column

Return type:

suppressMaxRetriesException

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters:

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters:

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters:

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters:

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters:

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setPages(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeInvoices_87b50f73e5d3_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeInvoices_87b50f73e5d3_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters:

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters:

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.AnalyzeLayout module

class synapse.ml.services.form.AnalyzeLayout.AnalyzeLayout(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_3cbb985477c9_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, outputCol='AnalyzeLayout_3cbb985477c9_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • language (object) – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • readingOrder (object) – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getBackoffs()[source]
Returns:

array of backoffs to use in the handler

Return type:

backoffs

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getImageBytes()[source]
Returns:

bytestream of the image to use

Return type:

imageBytes

getImageUrl()[source]
Returns:

the url of the image to use

Return type:

imageUrl

getInitialPollingDelay()[source]
Returns:

number of milliseconds to wait before first poll for result

Return type:

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns:

The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

Return type:

language

getMaxPollingRetries()[source]
Returns:

number of times to poll

Return type:

maxPollingRetries

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPages()[source]
Returns:

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type:

pages

getPollingDelay()[source]
Returns:

number of milliseconds to wait between polling

Return type:

pollingDelay

getReadingOrder()[source]
Returns:

Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

Return type:

readingOrder

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns:

set true to suppress the maxumimum retries exception and report in the error column

Return type:

suppressMaxRetriesException

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
language = Param(parent='undefined', name='language', doc='ServiceParam: The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

readingOrder = Param(parent='undefined', name='readingOrder', doc="ServiceParam: Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either 'basic' or 'natural'. Will default to basic if not specified")
setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters:

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters:

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters:

imageUrl – the url of the image to use

setInitialPollingDelay(value)[source]
Parameters:

initialPollingDelay – number of milliseconds to wait before first poll for result

setLanguage(value)[source]
Parameters:

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLanguageCol(value)[source]
Parameters:

language – The BCP-47 language code of the text in the document. Layout supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters:

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setPages(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeLayout_3cbb985477c9_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, outputCol='AnalyzeLayout_3cbb985477c9_output', pages=None, pagesCol=None, pollingDelay=300, readingOrder=None, readingOrderCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters:

pollingDelay – number of milliseconds to wait between polling

setReadingOrder(value)[source]
Parameters:

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setReadingOrderCol(value)[source]
Parameters:

readingOrder – Optional parameter to specify which reading order algorithm should be applied when ordering the extract text elements. Can be either ‘basic’ or ‘natural’. Will default to basic if not specified

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters:

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.AnalyzeReceipts module

class synapse.ml.services.form.AnalyzeReceipts.AnalyzeReceipts(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_23815563a627_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeReceipts_23815563a627_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • includeTextDetails (object) – Include text lines and element references in the result.

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • locale (object) – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

  • maxPollingRetries (int) – number of times to poll

  • outputCol (str) – The name of the output column

  • pages (object) – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

  • pollingDelay (int) – number of milliseconds to wait between polling

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getBackoffs()[source]
Returns:

array of backoffs to use in the handler

Return type:

backoffs

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getImageBytes()[source]
Returns:

bytestream of the image to use

Return type:

imageBytes

getImageUrl()[source]
Returns:

the url of the image to use

Return type:

imageUrl

getIncludeTextDetails()[source]
Returns:

Include text lines and element references in the result.

Return type:

includeTextDetails

getInitialPollingDelay()[source]
Returns:

number of milliseconds to wait before first poll for result

Return type:

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLocale()[source]
Returns:

Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

Return type:

locale

getMaxPollingRetries()[source]
Returns:

number of times to poll

Return type:

maxPollingRetries

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPages()[source]
Returns:

The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

Return type:

pages

getPollingDelay()[source]
Returns:

number of milliseconds to wait between polling

Return type:

pollingDelay

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns:

set true to suppress the maxumimum retries exception and report in the error column

Return type:

suppressMaxRetriesException

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

imageBytes = Param(parent='undefined', name='imageBytes', doc='ServiceParam: bytestream of the image to use')
imageUrl = Param(parent='undefined', name='imageUrl', doc='ServiceParam: the url of the image to use')
includeTextDetails = Param(parent='undefined', name='includeTextDetails', doc='ServiceParam: Include text lines and element references in the result.')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
locale = Param(parent='undefined', name='locale', doc='ServiceParam: Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pages = Param(parent='undefined', name='pages', doc="ServiceParam: The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.'1, 2' -> pages 1 and 2 will be processed), finite (e.g. '2-5' -> pages 2 to 5 will be processed) and open-ended ranges (e.g. '5-' -> all the pages from page 5 will be processed; e.g. '-10' -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. '-5, 1, 3, 5-10' - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using '5-100' on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.")
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters:

backoffs – array of backoffs to use in the handler

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setImageBytes(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters:

imageBytes – bytestream of the image to use

setImageUrl(value)[source]
Parameters:

imageUrl – the url of the image to use

setImageUrlCol(value)[source]
Parameters:

imageUrl – the url of the image to use

setIncludeTextDetails(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setIncludeTextDetailsCol(value)[source]
Parameters:

includeTextDetails – Include text lines and element references in the result.

setInitialPollingDelay(value)[source]
Parameters:

initialPollingDelay – number of milliseconds to wait before first poll for result

setLinkedService(value)[source]
setLocale(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocaleCol(value)[source]
Parameters:

locale – Locale of the receipt. Supported locales: en-AU, en-CA, en-GB, en-IN, en-US.

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters:

maxPollingRetries – number of times to poll

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setPages(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setPagesCol(value)[source]
Parameters:

pages – The page selection only leveraged for multi-page PDF and TIFF documents. Accepted input include single pages (e.g.’1, 2’ -> pages 1 and 2 will be processed), finite (e.g. ‘2-5’ -> pages 2 to 5 will be processed) and open-ended ranges (e.g. ‘5-’ -> all the pages from page 5 will be processed; e.g. ‘-10’ -> pages 1 to 10 will be processed). All of these can be mixed together and ranges are allowed to overlap (eg. ‘-5, 1, 3, 5-10’ - pages 1 to 10 will be processed). The service will accept the request if it can process at least one page of the document (e.g. using ‘5-100’ on a 5 page document is a valid input where page 5 will be processed). If no page range is provided, the entire document will be processed.

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AnalyzeReceipts_23815563a627_error', imageBytes=None, imageBytesCol=None, imageUrl=None, imageUrlCol=None, includeTextDetails=None, includeTextDetailsCol=None, initialPollingDelay=300, locale=None, localeCol=None, maxPollingRetries=1000, outputCol='AnalyzeReceipts_23815563a627_output', pages=None, pagesCol=None, pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters:

pollingDelay – number of milliseconds to wait between polling

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters:

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.FormOntologyLearner module

class synapse.ml.services.form.FormOntologyLearner.FormOntologyLearner(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaEstimator

Parameters:
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.services.form.FormOntologyTransformer module

class synapse.ml.services.form.FormOntologyTransformer.FormOntologyTransformer(java_obj=None, inputCol=None, ontology=None, outputCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaModel

Parameters:
  • inputCol (str) – The name of the input column

  • ontology (object) – The ontology to cast values to

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns:

The name of the input column

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getOntology()[source]
Returns:

The ontology to cast values to

Return type:

ontology

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
ontology = Param(parent='undefined', name='ontology', doc='The ontology to cast values to')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters:

inputCol – The name of the input column

setOntology(value)[source]
Parameters:

ontology – The ontology to cast values to

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(inputCol=None, ontology=None, outputCol=None)[source]

Set the (keyword only) parameters

synapse.ml.services.form.GetCustomModel module

class synapse.ml.services.form.GetCustomModel.GetCustomModel(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, concurrency=1, concurrentTimeout=None, errorCol='GetCustomModel_18af25ee98ee_error', handler=None, includeKeys=None, includeKeysCol=None, modelId=None, modelIdCol=None, outputCol='GetCustomModel_18af25ee98ee_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • includeKeys (object) – Include list of extracted keys in model information.

  • modelId (object) – Model identifier.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getHandler()[source]
Returns:

Which strategy to use when handling requests

Return type:

handler

getIncludeKeys()[source]
Returns:

Include list of extracted keys in model information.

Return type:

includeKeys

static getJavaPackage()[source]

Returns package name String.

getModelId()[source]
Returns:

Model identifier.

Return type:

modelId

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
includeKeys = Param(parent='undefined', name='includeKeys', doc='ServiceParam: Include list of extracted keys in model information.')
modelId = Param(parent='undefined', name='modelId', doc='ServiceParam: Model identifier.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setHandler(value)[source]
Parameters:

handler – Which strategy to use when handling requests

setIncludeKeys(value)[source]
Parameters:

includeKeys – Include list of extracted keys in model information.

setIncludeKeysCol(value)[source]
Parameters:

includeKeys – Include list of extracted keys in model information.

setLinkedService(value)[source]
setLocation(value)[source]
setModelId(value)[source]
Parameters:

modelId – Model identifier.

setModelIdCol(value)[source]
Parameters:

modelId – Model identifier.

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, concurrency=1, concurrentTimeout=None, errorCol='GetCustomModel_18af25ee98ee_error', handler=None, includeKeys=None, includeKeysCol=None, modelId=None, modelIdCol=None, outputCol='GetCustomModel_18af25ee98ee_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.services.form.ListCustomModels module

class synapse.ml.services.form.ListCustomModels.ListCustomModels(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, concurrency=1, concurrentTimeout=None, errorCol='ListCustomModels_fb5c0ab5e914_error', handler=None, op=None, opCol=None, outputCol='ListCustomModels_fb5c0ab5e914_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • op (object) – Specify whether to return summary or full list of models.

  • outputCol (str) – The name of the output column

  • subscriptionKey (object) – the API key to use

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getHandler()[source]
Returns:

Which strategy to use when handling requests

Return type:

handler

static getJavaPackage()[source]

Returns package name String.

getOp()[source]
Returns:

Specify whether to return summary or full list of models.

Return type:

op

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
op = Param(parent='undefined', name='op', doc='ServiceParam: Specify whether to return summary or full list of models.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setHandler(value)[source]
Parameters:

handler – Which strategy to use when handling requests

setLinkedService(value)[source]
setLocation(value)[source]
setOp(value)[source]
Parameters:

op – Specify whether to return summary or full list of models.

setOpCol(value)[source]
Parameters:

op – Specify whether to return summary or full list of models.

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, concurrency=1, concurrentTimeout=None, errorCol='ListCustomModels_fb5c0ab5e914_error', handler=None, op=None, opCol=None, outputCol='ListCustomModels_fb5c0ab5e914_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.