synapse.ml.io.http package

Submodules

synapse.ml.io.http.CustomInputParser module

class synapse.ml.io.http.CustomInputParser.CustomInputParser(java_obj=None, inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • udfPython (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getUdfPython()[source]
Returns

User Defined Python Function to be applied to the DF input col

Return type

udfPython

getUdfScala()[source]
Returns

User Defined Function to be applied to the DF input col

Return type

udfScala

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Set the (keyword only) parameters

setUdfPython(value)[source]
Parameters

udfPython – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]
Parameters

udfScala – User Defined Function to be applied to the DF input col

udfPython = Param(parent='undefined', name='udfPython', doc='User Defined Python Function to be applied to the DF input col')
udfScala = Param(parent='undefined', name='udfScala', doc='User Defined Function to be applied to the DF input col')

synapse.ml.io.http.CustomOutputParser module

class synapse.ml.io.http.CustomOutputParser.CustomOutputParser(java_obj=None, inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • udfPython (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getUdfPython()[source]
Returns

User Defined Python Function to be applied to the DF input col

Return type

udfPython

getUdfScala()[source]
Returns

User Defined Function to be applied to the DF input col

Return type

udfScala

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Set the (keyword only) parameters

setUdfPython(value)[source]
Parameters

udfPython – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]
Parameters

udfScala – User Defined Function to be applied to the DF input col

udfPython = Param(parent='undefined', name='udfPython', doc='User Defined Python Function to be applied to the DF input col')
udfScala = Param(parent='undefined', name='udfScala', doc='User Defined Function to be applied to the DF input col')

synapse.ml.io.http.HTTPFunctions module

synapse.ml.io.http.HTTPFunctions.http_udf(func)[source]
synapse.ml.io.http.HTTPFunctions.requests_to_spark(p)[source]

synapse.ml.io.http.HTTPTransformer module

class synapse.ml.io.http.HTTPTransformer.HTTPTransformer(java_obj=None, concurrency=1, concurrentTimeout=None, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • handler (object) – Which strategy to use when handling requests

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • timeout (float) – number of seconds to wait before closing the connection

concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
getConcurrency()[source]
Returns

max number of concurrent calls

Return type

concurrency

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1

Return type

concurrentTimeout

getHandler()[source]
Returns

Which strategy to use when handling requests

Return type

handler

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getTimeout()[source]
Returns

number of seconds to wait before closing the connection

Return type

timeout

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setHandler(value)[source]
Parameters

handler – Which strategy to use when handling requests

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(concurrency=1, concurrentTimeout=None, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]

Set the (keyword only) parameters

setTimeout(value)[source]
Parameters

timeout – number of seconds to wait before closing the connection

timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')

synapse.ml.io.http.JSONInputParser module

class synapse.ml.io.http.JSONInputParser.JSONInputParser(java_obj=None, headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • headers (dict) – headers of the request

  • inputCol (str) – The name of the input column

  • method (str) – method to use for request, (PUT, POST, PATCH)

  • outputCol (str) – The name of the output column

  • url (str) – Url of the service

getHeaders()[source]
Returns

headers of the request

Return type

headers

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getMethod()[source]
Returns

method to use for request, (PUT, POST, PATCH)

Return type

method

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getUrl()[source]
Returns

Url of the service

Return type

url

headers = Param(parent='undefined', name='headers', doc='headers of the request')
inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
method = Param(parent='undefined', name='method', doc='method to use for request, (PUT, POST, PATCH)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setHeaders(value)[source]
Parameters

headers – headers of the request

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setMethod(value)[source]
Parameters

method – method to use for request, (PUT, POST, PATCH)

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]

Set the (keyword only) parameters

setUrl(value)[source]
Parameters

url – Url of the service

url = Param(parent='undefined', name='url', doc='Url of the service')

synapse.ml.io.http.JSONOutputParser module

class synapse.ml.io.http.JSONOutputParser.JSONOutputParser(java_obj=None, dataType=None, inputCol=None, outputCol=None, postProcessor=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

getDataType()[source]
Returns

format to parse the column to

Return type

dataType

setDataType(value)[source]
Parameters

dataType – format to parse the column to

synapse.ml.io.http.ServingFunctions module

synapse.ml.io.http.ServingFunctions.request_to_string(c)[source]
synapse.ml.io.http.ServingFunctions.string_to_response(c)[source]

synapse.ml.io.http.SimpleHTTPTransformer module

class synapse.ml.io.http.SimpleHTTPTransformer.SimpleHTTPTransformer(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='SimpleHTTPTransformer_7da31ce14d3e_errors', flattenOutputBatches=None, handler=None, inputCol=None, inputParser=None, miniBatcher=None, outputCol=None, outputParser=None, timeout=60.0)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

setUrl(value)[source]

synapse.ml.io.http.StringOutputParser module

class synapse.ml.io.http.StringOutputParser.StringOutputParser(java_obj=None, inputCol=None, outputCol=None)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.