synapse.ml.services.language package

Submodules

synapse.ml.services.language.AnalyzeText module

class synapse.ml.services.language.AnalyzeText.AnalyzeText(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, apiVersion=None, apiVersionCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, countryHint=None, countryHintCol=None, domain=None, domainCol=None, errorCol='AnalyzeText_4d8d8178d9b9_error', handler=None, kind=None, language=None, languageCol=None, loggingOptOut=None, loggingOptOutCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='AnalyzeText_4d8d8178d9b9_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • apiVersion (object) – version of the api

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • countryHint (object) – the countryHint for language detection

  • domain (object) – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • kind (str) – Enumeration of supported Text Analysis tasks

  • language (object) – the language code of the text (optional for some services)

  • loggingOptOut (object) – loggingOptOut for task

  • modelVersion (object) – Version of the model

  • opinionMining (object) – opinionMining option for SentimentAnalysisTask

  • outputCol (str) – The name of the output column

  • piiCategories (object) – describes the PII categories to return

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
apiVersion = Param(parent='undefined', name='apiVersion', doc='ServiceParam: version of the api')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
countryHint = Param(parent='undefined', name='countryHint', doc='ServiceParam: the countryHint for language detection')
domain = Param(parent='undefined', name='domain', doc="ServiceParam: if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: 'PHI', 'none'.")
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getApiVersion()[source]
Returns:

version of the api

Return type:

apiVersion

getBatchSize()[source]
Returns:

The max size of the buffer

Return type:

batchSize

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCountryHint()[source]
Returns:

the countryHint for language detection

Return type:

countryHint

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getDomain()[source]
Returns:

if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

Return type:

domain

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getHandler()[source]
Returns:

Which strategy to use when handling requests

Return type:

handler

static getJavaPackage()[source]

Returns package name String.

getKind()[source]
Returns:

Enumeration of supported Text Analysis tasks

Return type:

kind

getLanguage()[source]
Returns:

the language code of the text (optional for some services)

Return type:

language

getLoggingOptOut()[source]
Returns:

loggingOptOut for task

Return type:

loggingOptOut

getModelVersion()[source]
Returns:

Version of the model

Return type:

modelVersion

getOpinionMining()[source]
Returns:

opinionMining option for SentimentAnalysisTask

Return type:

opinionMining

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPiiCategories()[source]
Returns:

describes the PII categories to return

Return type:

piiCategories

getShowStats()[source]
Returns:

Whether to include detailed statistics in the response

Return type:

showStats

getStringIndexType()[source]
Returns:

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type:

stringIndexType

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getText()[source]
Returns:

the text in the request body

Return type:

text

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getUrl()[source]
Returns:

Url of the service

Return type:

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
kind = Param(parent='undefined', name='kind', doc='Enumeration of supported Text Analysis tasks')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
loggingOptOut = Param(parent='undefined', name='loggingOptOut', doc='ServiceParam: loggingOptOut for task')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
opinionMining = Param(parent='undefined', name='opinionMining', doc='ServiceParam: opinionMining option for SentimentAnalysisTask')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
piiCategories = Param(parent='undefined', name='piiCategories', doc='ServiceParam: describes the PII categories to return')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setApiVersion(value)[source]
Parameters:

apiVersion – version of the api

setApiVersionCol(value)[source]
Parameters:

apiVersion – version of the api

setBatchSize(value)[source]
Parameters:

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCountryHint(value)[source]
Parameters:

countryHint – the countryHint for language detection

setCountryHintCol(value)[source]
Parameters:

countryHint – the countryHint for language detection

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDomain(value)[source]
Parameters:

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setDomainCol(value)[source]
Parameters:

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setHandler(value)[source]
Parameters:

handler – Which strategy to use when handling requests

setKind(value)[source]
Parameters:

kind – Enumeration of supported Text Analysis tasks

setLanguage(value)[source]
Parameters:

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters:

language – the language code of the text (optional for some services)

setLocation(value)[source]
setLoggingOptOut(value)[source]
Parameters:

loggingOptOut – loggingOptOut for task

setLoggingOptOutCol(value)[source]
Parameters:

loggingOptOut – loggingOptOut for task

setModelVersion(value)[source]
Parameters:

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters:

modelVersion – Version of the model

setOpinionMining(value)[source]
Parameters:

opinionMining – opinionMining option for SentimentAnalysisTask

setOpinionMiningCol(value)[source]
Parameters:

opinionMining – opinionMining option for SentimentAnalysisTask

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, apiVersion=None, apiVersionCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, countryHint=None, countryHintCol=None, domain=None, domainCol=None, errorCol='AnalyzeText_4d8d8178d9b9_error', handler=None, kind=None, language=None, languageCol=None, loggingOptOut=None, loggingOptOutCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='AnalyzeText_4d8d8178d9b9_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPiiCategories(value)[source]
Parameters:

piiCategories – describes the PII categories to return

setPiiCategoriesCol(value)[source]
Parameters:

piiCategories – describes the PII categories to return

setShowStats(value)[source]
Parameters:

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters:

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters:

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters:

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setText(value)[source]
Parameters:

text – the text in the request body

setTextCol(value)[source]
Parameters:

text – the text in the request body

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters:

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.