synapse.ml.cognitive.text package

Submodules

synapse.ml.cognitive.text.AnalyzeHealthText module

class synapse.ml.cognitive.text.AnalyzeHealthText.AnalyzeHealthText(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='AnalyzeHealthText_0d2afcb6febd_error', initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeHealthText_0d2afcb6febd_output', pollingDelay=300, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • language (object) – the language code of the text (optional for some services)

  • maxPollingRetries (int) – number of times to poll

  • modelVersion (object) – Version of the model

  • outputCol (str) – The name of the output column

  • pollingDelay (int) – number of milliseconds to wait between polling

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesException

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='AnalyzeHealthText_0d2afcb6febd_error', initialPollingDelay=300, language=None, languageCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='AnalyzeHealthText_0d2afcb6febd_output', pollingDelay=300, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.text.EntityDetector module

class synapse.ml.cognitive.text.EntityDetector.EntityDetector(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='EntityDetector_7a338c0d0344_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='EntityDetector_7a338c0d0344_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – Version of the model

  • outputCol (str) – The name of the output column

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='EntityDetector_7a338c0d0344_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='EntityDetector_7a338c0d0344_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.text.KeyPhraseExtractor module

class synapse.ml.cognitive.text.KeyPhraseExtractor.KeyPhraseExtractor(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='KeyPhraseExtractor_7c3e7674401a_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='KeyPhraseExtractor_7c3e7674401a_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – Version of the model

  • outputCol (str) – The name of the output column

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='KeyPhraseExtractor_7c3e7674401a_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='KeyPhraseExtractor_7c3e7674401a_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.text.LanguageDetector module

class synapse.ml.cognitive.text.LanguageDetector.LanguageDetector(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='LanguageDetector_fbd930da654d_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='LanguageDetector_fbd930da654d_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – Version of the model

  • outputCol (str) – The name of the output column

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='LanguageDetector_fbd930da654d_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='LanguageDetector_fbd930da654d_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.text.NER module

class synapse.ml.cognitive.text.NER.NER(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='NER_828effcbd3ff_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='NER_828effcbd3ff_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – Version of the model

  • outputCol (str) – The name of the output column

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='NER_828effcbd3ff_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='NER_828effcbd3ff_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.text.PII module

class synapse.ml.cognitive.text.PII.PII(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, domain=None, domainCol=None, errorCol='PII_c76280ea77f1_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='PII_c76280ea77f1_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • domain (object) – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – Version of the model

  • outputCol (str) – The name of the output column

  • piiCategories (object) – describes the PII categories to return

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
domain = Param(parent='undefined', name='domain', doc="ServiceParam: if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: 'PHI', 'none'.")
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getDomain()[source]
Returns

if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

Return type

domain

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPiiCategories()[source]
Returns

describes the PII categories to return

Return type

piiCategories

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
piiCategories = Param(parent='undefined', name='piiCategories', doc='ServiceParam: describes the PII categories to return')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDomain(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setDomainCol(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, domain=None, domainCol=None, errorCol='PII_c76280ea77f1_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, outputCol='PII_c76280ea77f1_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPiiCategories(value)[source]
Parameters

piiCategories – describes the PII categories to return

setPiiCategoriesCol(value)[source]
Parameters

piiCategories – describes the PII categories to return

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.text.TextAnalyze module

class synapse.ml.cognitive.text.TextAnalyze.TextAnalyze(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, entityLinkingParams={'model-version': 'latest'}, entityRecognitionParams={'model-version': 'latest'}, errorCol='TextAnalyze_bdde7ff93c07_error', includeEntityLinking=True, includeEntityRecognition=True, includeKeyPhraseExtraction=True, includePii=True, includeSentimentAnalysis=True, initialPollingDelay=300, keyPhraseExtractionParams={'model-version': 'latest'}, language=None, languageCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='TextAnalyze_bdde7ff93c07_output', piiParams={'model-version': 'latest'}, pollingDelay=300, sentimentAnalysisParams={'model-version': 'latest'}, showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • backoffs (list) – array of backoffs to use in the handler

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • entityLinkingParams (dict) – the parameters to pass to the entityLinking model

  • entityRecognitionParams (dict) – the parameters to pass to the entity recognition model

  • errorCol (str) – column to hold http errors

  • includeEntityLinking (bool) – Whether to perform EntityLinking

  • includeEntityRecognition (bool) – Whether to perform entity recognition

  • includeKeyPhraseExtraction (bool) – Whether to perform EntityLinking

  • includePii (bool) – Whether to perform PII Detection

  • includeSentimentAnalysis (bool) – Whether to perform SentimentAnalysis

  • initialPollingDelay (int) – number of milliseconds to wait before first poll for result

  • keyPhraseExtractionParams (dict) – the parameters to pass to the keyPhraseExtraction model

  • language (object) – the language code of the text (optional for some services)

  • maxPollingRetries (int) – number of times to poll

  • modelVersion (object) – Version of the model

  • outputCol (str) – The name of the output column

  • piiParams (dict) – the parameters to pass to the PII model

  • pollingDelay (int) – number of milliseconds to wait between polling

  • sentimentAnalysisParams (dict) – the parameters to pass to the sentimentAnalysis model

  • showStats (object) – Whether to include detailed statistics in the response

  • subscriptionKey (object) – the API key to use

  • suppressMaxRetriesException (bool) – set true to suppress the maxumimum retries exception and report in the error column

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
entityLinkingParams = Param(parent='undefined', name='entityLinkingParams', doc='the parameters to pass to the entityLinking model')
entityRecognitionParams = Param(parent='undefined', name='entityRecognitionParams', doc='the parameters to pass to the entity recognition model')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBackoffs()[source]
Returns

array of backoffs to use in the handler

Return type

backoffs

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getEntityLinkingParams()[source]
Returns

the parameters to pass to the entityLinking model

Return type

entityLinkingParams

getEntityRecognitionParams()[source]
Returns

the parameters to pass to the entity recognition model

Return type

entityRecognitionParams

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getIncludeEntityLinking()[source]
Returns

Whether to perform EntityLinking

Return type

includeEntityLinking

getIncludeEntityRecognition()[source]
Returns

Whether to perform entity recognition

Return type

includeEntityRecognition

getIncludeKeyPhraseExtraction()[source]
Returns

Whether to perform EntityLinking

Return type

includeKeyPhraseExtraction

getIncludePii()[source]
Returns

Whether to perform PII Detection

Return type

includePii

getIncludeSentimentAnalysis()[source]
Returns

Whether to perform SentimentAnalysis

Return type

includeSentimentAnalysis

getInitialPollingDelay()[source]
Returns

number of milliseconds to wait before first poll for result

Return type

initialPollingDelay

static getJavaPackage()[source]

Returns package name String.

getKeyPhraseExtractionParams()[source]
Returns

the parameters to pass to the keyPhraseExtraction model

Return type

keyPhraseExtractionParams

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getMaxPollingRetries()[source]
Returns

number of times to poll

Return type

maxPollingRetries

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPiiParams()[source]
Returns

the parameters to pass to the PII model

Return type

piiParams

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling

Return type

pollingDelay

getSentimentAnalysisParams()[source]
Returns

the parameters to pass to the sentimentAnalysis model

Return type

sentimentAnalysisParams

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getSuppressMaxRetriesException()[source]
Returns

set true to suppress the maxumimum retries exception and report in the error column

Return type

suppressMaxRetriesException

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

includeEntityLinking = Param(parent='undefined', name='includeEntityLinking', doc='Whether to perform EntityLinking')
includeEntityRecognition = Param(parent='undefined', name='includeEntityRecognition', doc='Whether to perform entity recognition')
includeKeyPhraseExtraction = Param(parent='undefined', name='includeKeyPhraseExtraction', doc='Whether to perform EntityLinking')
includePii = Param(parent='undefined', name='includePii', doc='Whether to perform PII Detection')
includeSentimentAnalysis = Param(parent='undefined', name='includeSentimentAnalysis', doc='Whether to perform SentimentAnalysis')
initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
keyPhraseExtractionParams = Param(parent='undefined', name='keyPhraseExtractionParams', doc='the parameters to pass to the keyPhraseExtraction model')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
piiParams = Param(parent='undefined', name='piiParams', doc='the parameters to pass to the PII model')
pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
classmethod read()[source]

Returns an MLReader instance for this class.

sentimentAnalysisParams = Param(parent='undefined', name='sentimentAnalysisParams', doc='the parameters to pass to the sentimentAnalysis model')
setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBackoffs(value)[source]
Parameters

backoffs – array of backoffs to use in the handler

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setEndpoint(value)[source]
setEntityLinkingParams(value)[source]
Parameters

entityLinkingParams – the parameters to pass to the entityLinking model

setEntityRecognitionParams(value)[source]
Parameters

entityRecognitionParams – the parameters to pass to the entity recognition model

setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setIncludeEntityLinking(value)[source]
Parameters

includeEntityLinking – Whether to perform EntityLinking

setIncludeEntityRecognition(value)[source]
Parameters

includeEntityRecognition – Whether to perform entity recognition

setIncludeKeyPhraseExtraction(value)[source]
Parameters

includeKeyPhraseExtraction – Whether to perform EntityLinking

setIncludePii(value)[source]
Parameters

includePii – Whether to perform PII Detection

setIncludeSentimentAnalysis(value)[source]
Parameters

includeSentimentAnalysis – Whether to perform SentimentAnalysis

setInitialPollingDelay(value)[source]
Parameters

initialPollingDelay – number of milliseconds to wait before first poll for result

setKeyPhraseExtractionParams(value)[source]
Parameters

keyPhraseExtractionParams – the parameters to pass to the keyPhraseExtraction model

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries – number of times to poll

setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, backoffs=[100, 500, 1000], batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, entityLinkingParams={'model-version': 'latest'}, entityRecognitionParams={'model-version': 'latest'}, errorCol='TextAnalyze_bdde7ff93c07_error', includeEntityLinking=True, includeEntityRecognition=True, includeKeyPhraseExtraction=True, includePii=True, includeSentimentAnalysis=True, initialPollingDelay=300, keyPhraseExtractionParams={'model-version': 'latest'}, language=None, languageCol=None, maxPollingRetries=1000, modelVersion=None, modelVersionCol=None, outputCol='TextAnalyze_bdde7ff93c07_output', piiParams={'model-version': 'latest'}, pollingDelay=300, sentimentAnalysisParams={'model-version': 'latest'}, showStats=None, showStatsCol=None, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesException=False, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPiiParams(value)[source]
Parameters

piiParams – the parameters to pass to the PII model

setPollingDelay(value)[source]
Parameters

pollingDelay – number of milliseconds to wait between polling

setSentimentAnalysisParams(value)[source]
Parameters

sentimentAnalysisParams – the parameters to pass to the sentimentAnalysis model

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setSuppressMaxRetriesException(value)[source]
Parameters

suppressMaxRetriesException – set true to suppress the maxumimum retries exception and report in the error column

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
suppressMaxRetriesException = Param(parent='undefined', name='suppressMaxRetriesException', doc='set true to suppress the maxumimum retries exception and report in the error column')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.text.TextSentiment module

class synapse.ml.cognitive.text.TextSentiment.TextSentiment(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='TextSentiment_f1f51e39b01b_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='TextSentiment_f1f51e39b01b_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • disableServiceLogs (object) – disableServiceLogs option

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • language (object) – the language code of the text (optional for some services)

  • modelVersion (object) – Version of the model

  • opinionMining (object) – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

  • outputCol (str) – The name of the output column

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
disableServiceLogs = Param(parent='undefined', name='disableServiceLogs', doc='ServiceParam: disableServiceLogs option')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCustomAuthHeader()[source]
Returns

A Custom Value for Authorization Header

Return type

CustomAuthHeader

getDisableServiceLogs()[source]
Returns

disableServiceLogs option

Return type

disableServiceLogs

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOpinionMining()[source]
Returns

if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

Return type

opinionMining

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
opinionMining = Param(parent='undefined', name='opinionMining', doc='ServiceParam: if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDisableServiceLogs(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setDisableServiceLogsCol(value)[source]
Parameters

disableServiceLogs – disableServiceLogs option

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLinkedService(value)[source]
setLocation(value)[source]
setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOpinionMining(value)[source]
Parameters

opinionMining – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

setOpinionMiningCol(value)[source]
Parameters

opinionMining – if set to true, response will contain not only sentiment prediction but also opinion mining (aspect-based sentiment analysis) results.

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, disableServiceLogs=None, disableServiceLogsCol=None, errorCol='TextSentiment_f1f51e39b01b_error', handler=None, language=None, languageCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='TextSentiment_f1f51e39b01b_output', showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.