synapse.ml.cognitive.language package

Submodules

synapse.ml.cognitive.language.AnalyzeText module

class synapse.ml.cognitive.language.AnalyzeText.AnalyzeText(java_obj=None, AADToken=None, AADTokenCol=None, apiVersion=None, apiVersionCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, countryHint=None, countryHintCol=None, domain=None, domainCol=None, errorCol='AnalyzeText_1db96c1380cb_error', handler=None, kind=None, language=None, languageCol=None, loggingOptOut=None, loggingOptOutCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='AnalyzeText_1db96c1380cb_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • AADToken (object) – AAD Token used for authentication

  • apiVersion (object) – version of the api

  • batchSize (int) – The max size of the buffer

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • countryHint (object) – the countryHint for language detection

  • domain (object) – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

  • errorCol (str) – column to hold http errors

  • handler (object) – Which strategy to use when handling requests

  • kind (str) – Enumeration of supported Text Analysis tasks

  • language (object) – the language code of the text (optional for some services)

  • loggingOptOut (object) – loggingOptOut for task

  • modelVersion (object) – Version of the model

  • opinionMining (object) – opinionMining option for SentimentAnalysisTask

  • outputCol (str) – The name of the output column

  • piiCategories (object) – describes the PII categories to return

  • showStats (object) – Whether to include detailed statistics in the response

  • stringIndexType (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body

  • timeout (float) – number of seconds to wait before closing the connection

  • url (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
apiVersion = Param(parent='undefined', name='apiVersion', doc='ServiceParam: version of the api')
batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
countryHint = Param(parent='undefined', name='countryHint', doc='ServiceParam: the countryHint for language detection')
domain = Param(parent='undefined', name='domain', doc="ServiceParam: if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: 'PHI', 'none'.")
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
getAADToken()[source]
Returns

AAD Token used for authentication

Return type

AADToken

getApiVersion()[source]
Returns

version of the api

Return type

apiVersion

getBatchSize()[source]
Returns

The max size of the buffer

Return type

batchSize

getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getCountryHint()[source]
Returns

the countryHint for language detection

Return type

countryHint

getDomain()[source]
Returns

if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

Return type

domain

getErrorCol()[source]
Returns

column to hold http errors

Return type

errorCol

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

static getJavaPackage()[source]

Returns package name String.

getKind()[source]
Returns

Enumeration of supported Text Analysis tasks

Return type

kind

getLanguage()[source]
Returns

the language code of the text (optional for some services)

Return type

language

getLoggingOptOut()[source]
Returns

loggingOptOut for task

Return type

loggingOptOut

getModelVersion()[source]
Returns

Version of the model

Return type

modelVersion

getOpinionMining()[source]
Returns

opinionMining option for SentimentAnalysisTask

Return type

opinionMining

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPiiCategories()[source]
Returns

describes the PII categories to return

Return type

piiCategories

getShowStats()[source]
Returns

Whether to include detailed statistics in the response

Return type

showStats

getStringIndexType()[source]
Returns

Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

Return type

stringIndexType

getSubscriptionKey()[source]
Returns

the API key to use

Return type

subscriptionKey

getText()[source]
Returns

the text in the request body

Return type

text

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

getUrl()[source]
Returns

Url of the service

Return type

url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
kind = Param(parent='undefined', name='kind', doc='Enumeration of supported Text Analysis tasks')
language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
loggingOptOut = Param(parent='undefined', name='loggingOptOut', doc='ServiceParam: loggingOptOut for task')
modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
opinionMining = Param(parent='undefined', name='opinionMining', doc='ServiceParam: opinionMining option for SentimentAnalysisTask')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
piiCategories = Param(parent='undefined', name='piiCategories', doc='ServiceParam: describes the PII categories to return')
classmethod read()[source]

Returns an MLReader instance for this class.

setAADToken(value)[source]
Parameters

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters

AADToken – AAD Token used for authentication

setApiVersion(value)[source]
Parameters

apiVersion – version of the api

setApiVersionCol(value)[source]
Parameters

apiVersion – version of the api

setBatchSize(value)[source]
Parameters

batchSize – The max size of the buffer

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCountryHint(value)[source]
Parameters

countryHint – the countryHint for language detection

setCountryHintCol(value)[source]
Parameters

countryHint – the countryHint for language detection

setCustomServiceName(value)[source]
setDefaultInternalEndpoint(value)[source]
setDomain(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setDomainCol(value)[source]
Parameters

domain – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters

errorCol – column to hold http errors

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setKind(value)[source]
Parameters

kind – Enumeration of supported Text Analysis tasks

setLanguage(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLanguageCol(value)[source]
Parameters

language – the language code of the text (optional for some services)

setLocation(value)[source]
setLoggingOptOut(value)[source]
Parameters

loggingOptOut – loggingOptOut for task

setLoggingOptOutCol(value)[source]
Parameters

loggingOptOut – loggingOptOut for task

setModelVersion(value)[source]
Parameters

modelVersion – Version of the model

setModelVersionCol(value)[source]
Parameters

modelVersion – Version of the model

setOpinionMining(value)[source]
Parameters

opinionMining – opinionMining option for SentimentAnalysisTask

setOpinionMiningCol(value)[source]
Parameters

opinionMining – opinionMining option for SentimentAnalysisTask

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, apiVersion=None, apiVersionCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, countryHint=None, countryHintCol=None, domain=None, domainCol=None, errorCol='AnalyzeText_1db96c1380cb_error', handler=None, kind=None, language=None, languageCol=None, loggingOptOut=None, loggingOptOutCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='AnalyzeText_1db96c1380cb_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

setPiiCategories(value)[source]
Parameters

piiCategories – describes the PII categories to return

setPiiCategoriesCol(value)[source]
Parameters

piiCategories – describes the PII categories to return

setShowStats(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setShowStatsCol(value)[source]
Parameters

showStats – Whether to include detailed statistics in the response

setStringIndexType(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setStringIndexTypeCol(value)[source]
Parameters

stringIndexType – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets

setSubscriptionKey(value)[source]
Parameters

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey – the API key to use

setText(value)[source]
Parameters

text – the text in the request body

setTextCol(value)[source]
Parameters

text – the text in the request body

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

setUrl(value)[source]
Parameters

url – Url of the service

showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
url = Param(parent='undefined', name='url', doc='Url of the service')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.