synapse.ml.cognitive.language package
Submodules
synapse.ml.cognitive.language.AnalyzeText module
- class synapse.ml.cognitive.language.AnalyzeText.AnalyzeText(java_obj=None, AADToken=None, AADTokenCol=None, apiVersion=None, apiVersionCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, countryHint=None, countryHintCol=None, domain=None, domainCol=None, errorCol='AnalyzeText_1ced658681ff_error', handler=None, kind=None, language=None, languageCol=None, loggingOptOut=None, loggingOptOutCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='AnalyzeText_1ced658681ff_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
concurrentTimeout¶ (float) – max number seconds to wait on futures if concurrency >= 1
countryHint¶ (object) – the countryHint for language detection
domain¶ (object) – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.
handler¶ (object) – Which strategy to use when handling requests
language¶ (object) – the language code of the text (optional for some services)
opinionMining¶ (object) – opinionMining option for SentimentAnalysisTask
piiCategories¶ (object) – describes the PII categories to return
showStats¶ (object) – Whether to include detailed statistics in the response
stringIndexType¶ (object) – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets
timeout¶ (float) – number of seconds to wait before closing the connection
- AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
- apiVersion = Param(parent='undefined', name='apiVersion', doc='ServiceParam: version of the api')
- batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')
- concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
- concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
- countryHint = Param(parent='undefined', name='countryHint', doc='ServiceParam: the countryHint for language detection')
- domain = Param(parent='undefined', name='domain', doc="ServiceParam: if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: 'PHI', 'none'.")
- errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
- getConcurrentTimeout()[source]
- Returns
max number seconds to wait on futures if concurrency >= 1
- Return type
concurrentTimeout
- getDomain()[source]
- Returns
if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.
- Return type
domain
- getLanguage()[source]
- Returns
the language code of the text (optional for some services)
- Return type
language
- getOpinionMining()[source]
- Returns
opinionMining option for SentimentAnalysisTask
- Return type
opinionMining
- getPiiCategories()[source]
- Returns
describes the PII categories to return
- Return type
piiCategories
- getShowStats()[source]
- Returns
Whether to include detailed statistics in the response
- Return type
showStats
- getStringIndexType()[source]
- Returns
Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets
- Return type
stringIndexType
- getTimeout()[source]
- Returns
number of seconds to wait before closing the connection
- Return type
timeout
- handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
- kind = Param(parent='undefined', name='kind', doc='Enumeration of supported Text Analysis tasks')
- language = Param(parent='undefined', name='language', doc='ServiceParam: the language code of the text (optional for some services)')
- loggingOptOut = Param(parent='undefined', name='loggingOptOut', doc='ServiceParam: loggingOptOut for task')
- modelVersion = Param(parent='undefined', name='modelVersion', doc='ServiceParam: Version of the model')
- opinionMining = Param(parent='undefined', name='opinionMining', doc='ServiceParam: opinionMining option for SentimentAnalysisTask')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
- piiCategories = Param(parent='undefined', name='piiCategories', doc='ServiceParam: describes the PII categories to return')
- setConcurrentTimeout(value)[source]
- Parameters
concurrentTimeout¶ – max number seconds to wait on futures if concurrency >= 1
- setDomain(value)[source]
- Parameters
domain¶ – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.
- setDomainCol(value)[source]
- Parameters
domain¶ – if specified, will set the PII domain to include only a subset of the entity categories. Possible values include: ‘PHI’, ‘none’.
- setLanguage(value)[source]
- Parameters
language¶ – the language code of the text (optional for some services)
- setLanguageCol(value)[source]
- Parameters
language¶ – the language code of the text (optional for some services)
- setOpinionMining(value)[source]
- Parameters
opinionMining¶ – opinionMining option for SentimentAnalysisTask
- setOpinionMiningCol(value)[source]
- Parameters
opinionMining¶ – opinionMining option for SentimentAnalysisTask
- setParams(AADToken=None, AADTokenCol=None, apiVersion=None, apiVersionCol=None, batchSize=10, concurrency=1, concurrentTimeout=None, countryHint=None, countryHintCol=None, domain=None, domainCol=None, errorCol='AnalyzeText_1ced658681ff_error', handler=None, kind=None, language=None, languageCol=None, loggingOptOut=None, loggingOptOutCol=None, modelVersion=None, modelVersionCol=None, opinionMining=None, opinionMiningCol=None, outputCol='AnalyzeText_1ced658681ff_output', piiCategories=None, piiCategoriesCol=None, showStats=None, showStatsCol=None, stringIndexType=None, stringIndexTypeCol=None, subscriptionKey=None, subscriptionKeyCol=None, text=None, textCol=None, timeout=60.0, url=None)[source]
Set the (keyword only) parameters
- setPiiCategoriesCol(value)[source]
- Parameters
piiCategories¶ – describes the PII categories to return
- setShowStats(value)[source]
- Parameters
showStats¶ – Whether to include detailed statistics in the response
- setShowStatsCol(value)[source]
- Parameters
showStats¶ – Whether to include detailed statistics in the response
- setStringIndexType(value)[source]
- Parameters
stringIndexType¶ – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets
- setStringIndexTypeCol(value)[source]
- Parameters
stringIndexType¶ – Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets
- setTimeout(value)[source]
- Parameters
timeout¶ – number of seconds to wait before closing the connection
- showStats = Param(parent='undefined', name='showStats', doc='ServiceParam: Whether to include detailed statistics in the response')
- stringIndexType = Param(parent='undefined', name='stringIndexType', doc='ServiceParam: Specifies the method used to interpret string offsets. Defaults to Text Elements (Graphemes) according to Unicode v8.0.0. For additional information see https://aka.ms/text-analytics-offsets')
- subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
- text = Param(parent='undefined', name='text', doc='ServiceParam: the text in the request body')
- timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
- url = Param(parent='undefined', name='url', doc='Url of the service')
Module contents
SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.