synapse.ml.io.http package

Submodules

synapse.ml.io.http.CustomInputParser module

class synapse.ml.io.http.CustomInputParser.CustomInputParser(java_obj=None, inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

inputCol¶ (str) – The name of the input column
outputCol¶ (str) – The name of the output column
udfPython¶ (object) – User Defined Python Function to be applied to the DF input col
udfScala¶ (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getOutputCol()[source]

Returns: The name of the output column
Return type: outputCol

getUdfPython()[source]

Returns: User Defined Python Function to be applied to the DF input col
Return type: udfPython

getUdfScala()[source]

Returns: User Defined Function to be applied to the DF input col
Return type: udfScala

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')

classmethod read()[source]: Returns an MLReader instance for this class.

setInputCol(value)[source]

Parameters: inputCol¶ – The name of the input column

setOutputCol(value)[source]

Parameters: outputCol¶ – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]: Set the (keyword only) parameters

setUdfPython(value)[source]

Parameters: udfPython¶ – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]

Parameters: udfScala¶ – User Defined Function to be applied to the DF input col

udfPython = Param(parent='undefined', name='udfPython', doc='User Defined Python Function to be applied to the DF input col')

udfScala = Param(parent='undefined', name='udfScala', doc='User Defined Function to be applied to the DF input col')

synapse.ml.io.http.CustomOutputParser module

class synapse.ml.io.http.CustomOutputParser.CustomOutputParser(java_obj=None, inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

inputCol¶ (str) – The name of the input column
outputCol¶ (str) – The name of the output column
udfPython¶ (object) – User Defined Python Function to be applied to the DF input col
udfScala¶ (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getOutputCol()[source]

Returns: The name of the output column
Return type: outputCol

getUdfPython()[source]

Returns: User Defined Python Function to be applied to the DF input col
Return type: udfPython

getUdfScala()[source]

Returns: User Defined Function to be applied to the DF input col
Return type: udfScala

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')

classmethod read()[source]: Returns an MLReader instance for this class.

setInputCol(value)[source]

Parameters: inputCol¶ – The name of the input column

setOutputCol(value)[source]

Parameters: outputCol¶ – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]: Set the (keyword only) parameters

setUdfPython(value)[source]

Parameters: udfPython¶ – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]

Parameters: udfScala¶ – User Defined Function to be applied to the DF input col

udfPython = Param(parent='undefined', name='udfPython', doc='User Defined Python Function to be applied to the DF input col')

udfScala = Param(parent='undefined', name='udfScala', doc='User Defined Function to be applied to the DF input col')

synapse.ml.io.http.HTTPFunctions module

synapse.ml.io.http.HTTPFunctions.http_udf(func)[source]

synapse.ml.io.http.HTTPFunctions.requests_to_spark(p)[source]

synapse.ml.io.http.HTTPTransformer module

class synapse.ml.io.http.HTTPTransformer.HTTPTransformer(java_obj=None, concurrency=1, concurrentTimeout=None, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

concurrency¶ (int) – max number of concurrent calls
concurrentTimeout¶ (float) – max number seconds to wait on futures if concurrency >= 1
handler¶ (object) – Which strategy to use when handling requests
inputCol¶ (str) – The name of the input column
outputCol¶ (str) – The name of the output column
timeout¶ (float) – number of seconds to wait before closing the connection

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')

concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')

getConcurrency()[source]

Returns: max number of concurrent calls
Return type: concurrency

getConcurrentTimeout()[source]

Returns: max number seconds to wait on futures if concurrency >= 1
Return type: concurrentTimeout

getHandler()[source]

Returns: Which strategy to use when handling requests
Return type: handler

getInputCol()[source]

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getOutputCol()[source]

Returns: The name of the output column
Return type: outputCol

getTimeout()[source]

Returns: number of seconds to wait before closing the connection
Return type: timeout

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')

classmethod read()[source]: Returns an MLReader instance for this class.

setConcurrency(value)[source]

Parameters: concurrency¶ – max number of concurrent calls

setConcurrentTimeout(value)[source]

Parameters: concurrentTimeout¶ – max number seconds to wait on futures if concurrency >= 1

setHandler(value)[source]

Parameters: handler¶ – Which strategy to use when handling requests

setInputCol(value)[source]

Parameters: inputCol¶ – The name of the input column

setOutputCol(value)[source]

Parameters: outputCol¶ – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]: Set the (keyword only) parameters

setTimeout(value)[source]

Parameters: timeout¶ – number of seconds to wait before closing the connection

timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')

synapse.ml.io.http.JSONInputParser module

class synapse.ml.io.http.JSONInputParser.JSONInputParser(java_obj=None, headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

headers¶ (dict) – headers of the request
inputCol¶ (str) – The name of the input column
method¶ (str) – method to use for request, (PUT, POST, PATCH)
outputCol¶ (str) – The name of the output column
url¶ (str) – Url of the service

getHeaders()[source]

Returns: headers of the request
Return type: headers

getInputCol()[source]

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getMethod()[source]

Returns: method to use for request, (PUT, POST, PATCH)
Return type: method

getOutputCol()[source]

Returns: The name of the output column
Return type: outputCol

getUrl()[source]

Returns: Url of the service
Return type: url

headers = Param(parent='undefined', name='headers', doc='headers of the request')

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')

method = Param(parent='undefined', name='method', doc='method to use for request, (PUT, POST, PATCH)')

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')

classmethod read()[source]: Returns an MLReader instance for this class.

setHeaders(value)[source]

Parameters: headers¶ – headers of the request

setInputCol(value)[source]

Parameters: inputCol¶ – The name of the input column

setMethod(value)[source]

Parameters: method¶ – method to use for request, (PUT, POST, PATCH)

setOutputCol(value)[source]

Parameters: outputCol¶ – The name of the output column

setParams(headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]: Set the (keyword only) parameters

setUrl(value)[source]

Parameters: url¶ – Url of the service

url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.io.http.JSONOutputParser module

class synapse.ml.io.http.JSONOutputParser.JSONOutputParser(java_obj=None, dataType=None, inputCol=None, outputCol=None, postProcessor=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

getDataType()[source]

Returns: format to parse the column to
Return type: dataType

setDataType(value)[source]

Parameters: dataType¶ – format to parse the column to

synapse.ml.io.http.ServingFunctions module

synapse.ml.io.http.ServingFunctions.request_to_string(c)[source]

synapse.ml.io.http.ServingFunctions.string_to_response(c)[source]

synapse.ml.io.http.SimpleHTTPTransformer module

class synapse.ml.io.http.SimpleHTTPTransformer.SimpleHTTPTransformer(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='SimpleHTTPTransformer_7da31ce14d3e_errors', flattenOutputBatches=None, handler=None, inputCol=None, inputParser=None, miniBatcher=None, outputCol=None, outputParser=None, timeout=60.0)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

setUrl(value)[source]

synapse.ml.io.http.StringOutputParser module

class synapse.ml.io.http.StringOutputParser.StringOutputParser(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

inputCol¶ (str) – The name of the input column
outputCol¶ (str) – The name of the output column

getInputCol()[source]

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getOutputCol()[source]

Returns: The name of the output column
Return type: outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')

classmethod read()[source]: Returns an MLReader instance for this class.

setInputCol(value)[source]

Parameters: inputCol¶ – The name of the input column

setOutputCol(value)[source]

Parameters: outputCol¶ – The name of the output column

setParams(inputCol=None, outputCol=None)[source]: Set the (keyword only) parameters

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.