synapse.ml.cognitive.search package

Submodules

synapse.ml.cognitive.search.AddDocuments module

class synapse.ml.cognitive.search.AddDocuments.AddDocuments(java_obj=None, AADToken=None, AADTokenCol=None, actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_f1c73ff1cc0d_error', handler=None, indexName=None, outputCol='AddDocuments_f1c73ff1cc0d_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

AADToken¶ (object) – AAD Token used for authentication
actionCol¶ (str) – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.
batchSize¶ (int) – The max size of the buffer
concurrency¶ (int) – max number of concurrent calls
concurrentTimeout¶ (float) – max number seconds to wait on futures if concurrency >= 1
errorCol¶ (str) – column to hold http errors
handler¶ (object) – Which strategy to use when handling requests
indexName¶ (str) –
outputCol¶ (str) – The name of the output column
serviceName¶ (str) –
subscriptionKey¶ (object) – the API key to use
timeout¶ (float) – number of seconds to wait before closing the connection
url¶ (str) – Url of the service

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')

actionCol = Param(parent='undefined', name='actionCol', doc=" You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an 'upsert' where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn't exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field 'tags' with value ['budget'] and you execute a merge with value ['economy', 'pool'] for 'tags', the final value of the 'tags' field will be ['economy', 'pool']. It will not be ['budget', 'economy', 'pool']. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. ")

batchSize = Param(parent='undefined', name='batchSize', doc='The max size of the buffer')

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')

concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')

errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')

getAADToken()[source]

Returns: AAD Token used for authentication
Return type: AADToken

getActionCol()[source]

Returns: You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.
Return type: actionCol

getBatchSize()[source]

Returns: The max size of the buffer
Return type: batchSize

getConcurrency()[source]

Returns: max number of concurrent calls
Return type: concurrency

getConcurrentTimeout()[source]

Returns: max number seconds to wait on futures if concurrency >= 1
Return type: concurrentTimeout

getErrorCol()[source]

Returns: column to hold http errors
Return type: errorCol

getHandler()[source]

Returns: Which strategy to use when handling requests
Return type: handler

getIndexName()[source]

Returns
Return type: indexName

static getJavaPackage()[source]: Returns package name String.

getOutputCol()[source]

Returns: The name of the output column
Return type: outputCol

getServiceName()[source]

Returns
Return type: serviceName

getSubscriptionKey()[source]

Returns: the API key to use
Return type: subscriptionKey

getTimeout()[source]

Returns: number of seconds to wait before closing the connection
Return type: timeout

getUrl()[source]

Returns: Url of the service
Return type: url

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')

indexName = Param(parent='undefined', name='indexName', doc='')

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')

classmethod read()[source]: Returns an MLReader instance for this class.

serviceName = Param(parent='undefined', name='serviceName', doc='')

setAADToken(value)[source]

Parameters: AADToken¶ – AAD Token used for authentication

setAADTokenCol(value)[source]

Parameters: AADToken¶ – AAD Token used for authentication

setActionCol(value)[source]

Parameters: actionCol¶ – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

setBatchSize(value)[source]

Parameters: batchSize¶ – The max size of the buffer

setConcurrency(value)[source]

Parameters: concurrency¶ – max number of concurrent calls

setConcurrentTimeout(value)[source]

Parameters: concurrentTimeout¶ – max number seconds to wait on futures if concurrency >= 1

setCustomServiceName(value)[source]

setDefaultInternalEndpoint(value)[source]

setEndpoint(value)[source]

setErrorCol(value)[source]

Parameters: errorCol¶ – column to hold http errors

setHandler(value)[source]

Parameters: handler¶ – Which strategy to use when handling requests

setIndexName(value)[source]

Parameters: indexName¶ –

setOutputCol(value)[source]

Parameters: outputCol¶ – The name of the output column

setParams(AADToken=None, AADTokenCol=None, actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=None, errorCol='AddDocuments_f1c73ff1cc0d_error', handler=None, indexName=None, outputCol='AddDocuments_f1c73ff1cc0d_output', serviceName=None, subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url=None)[source]: Set the (keyword only) parameters

setServiceName(value)[source]

Parameters: serviceName¶ –

setSubscriptionKey(value)[source]

Parameters: subscriptionKey¶ – the API key to use

setSubscriptionKeyCol(value)[source]

Parameters: subscriptionKey¶ – the API key to use

setTimeout(value)[source]

Parameters: timeout¶ – number of seconds to wait before closing the connection

setUrl(value)[source]

Parameters: url¶ – Url of the service

subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')

timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')

url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.cognitive.search.AzureSearchWriter module

synapse.ml.cognitive.search.AzureSearchWriter.streamToAzureSearch(df, **options)[source]

synapse.ml.cognitive.search.AzureSearchWriter.writeToAzureSearch(df, **options)[source]

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.