mmlspark.io.http package¶

Submodules¶

mmlspark.io.http.CustomInputParser module¶

class mmlspark.io.http.CustomInputParser.CustomInputParser(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
udfPython (object) – User Defined Python Function to be applied to the DF input col
udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getUdfPython()[source]¶

Returns: User Defined Python Function to be applied to the DF input col
Return type: object

getUdfScala()[source]¶

Returns: User Defined Function to be applied to the DF input col
Return type: object

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]¶

Set the (keyword only) parameters

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
udfPython (object) – User Defined Python Function to be applied to the DF input col
udfScala (object) – User Defined Function to be applied to the DF input col

setUdfPython(value)[source]¶

Parameters: udfPython (object) – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]¶

Parameters: udfScala (object) – User Defined Function to be applied to the DF input col

mmlspark.io.http.CustomOutputParser module¶

class mmlspark.io.http.CustomOutputParser.CustomOutputParser(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
udfPython (object) – User Defined Python Function to be applied to the DF input col
udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getUdfPython()[source]¶

Returns: User Defined Python Function to be applied to the DF input col
Return type: object

getUdfScala()[source]¶

Returns: User Defined Function to be applied to the DF input col
Return type: object

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]¶

Set the (keyword only) parameters

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
udfPython (object) – User Defined Python Function to be applied to the DF input col
udfScala (object) – User Defined Function to be applied to the DF input col

setUdfPython(value)[source]¶

Parameters: udfPython (object) – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]¶

Parameters: udfScala (object) – User Defined Function to be applied to the DF input col

mmlspark.io.http.HTTPTransformer module¶

class mmlspark.io.http.HTTPTransformer.HTTPTransformer(concurrency=1, concurrentTimeout=100.0, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

concurrency (int) – max number of concurrent calls (default: 1)
concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

getConcurrency()[source]¶

Returns: max number of concurrent calls (default: 1)
Return type: int

getConcurrentTimeout()[source]¶

Returns: max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
Return type: double

getHandler()[source]¶

Returns: Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
Return type: object

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getTimeout()[source]¶

Returns: number of seconds to wait before closing the connection (default: 60.0)
Return type: double

classmethod read()[source]¶: Returns an MLReader instance for this class.

setConcurrency(value)[source]¶

Parameters: concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]¶

Parameters: concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setHandler(value)[source]¶

Parameters: handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(concurrency=1, concurrentTimeout=100.0, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]¶

Set the (keyword only) parameters

Parameters

concurrency (int) – max number of concurrent calls (default: 1)
concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setTimeout(value)[source]¶

Parameters: timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

mmlspark.io.http.JSONInputParser module¶

class mmlspark.io.http.JSONInputParser.JSONInputParser(headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

headers (dict) – headers of the request (default: Map())
inputCol (str) – The name of the input column
method (str) – method to use for request, (PUT, POST, PATCH) (default: POST)
outputCol (str) – The name of the output column
url (str) – Url of the service

getHeaders()[source]¶

Returns: headers of the request (default: Map())
Return type: dict

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getMethod()[source]¶

Returns: method to use for request, (PUT, POST, PATCH) (default: POST)
Return type: str

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getUrl()[source]¶

Returns: Url of the service
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setHeaders(value)[source]¶

Parameters: headers (dict) – headers of the request (default: Map())

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setMethod(value)[source]¶

Parameters: method (str) – method to use for request, (PUT, POST, PATCH) (default: POST)

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]¶

Set the (keyword only) parameters

Parameters

headers (dict) – headers of the request (default: Map())
inputCol (str) – The name of the input column
method (str) – method to use for request, (PUT, POST, PATCH) (default: POST)
outputCol (str) – The name of the output column
url (str) – Url of the service

setUrl(value)[source]¶

Parameters: url (str) – Url of the service

mmlspark.io.http.JSONOutputParser module¶

class mmlspark.io.http.JSONOutputParser.JSONOutputParser(dataType=None, inputCol=None, outputCol=None, postProcessor=None)[source]¶

Bases: mmlspark.io.http._JSONOutputParser._JSONOutputParser

getDataType()[source]¶

Returns: format to parse the column to
Return type: object

setDataType(value)[source]¶

Parameters: dataType (object) – format to parse the column to

mmlspark.io.http.PartitionConsolidator module¶

class mmlspark.io.http.PartitionConsolidator.PartitionConsolidator(concurrency=1, concurrentTimeout=100.0, inputCol=None, outputCol=None, timeout=60.0)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

concurrency (int) – max number of concurrent calls (default: 1)
concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

getConcurrency()[source]¶

Returns: max number of concurrent calls (default: 1)
Return type: int

getConcurrentTimeout()[source]¶

Returns: max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
Return type: double

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

getTimeout()[source]¶

Returns: number of seconds to wait before closing the connection (default: 60.0)
Return type: double

classmethod read()[source]¶: Returns an MLReader instance for this class.

setConcurrency(value)[source]¶

Parameters: concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]¶

Parameters: concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(concurrency=1, concurrentTimeout=100.0, inputCol=None, outputCol=None, timeout=60.0)[source]¶

Set the (keyword only) parameters

Parameters

concurrency (int) – max number of concurrent calls (default: 1)
concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
inputCol (str) – The name of the input column
outputCol (str) – The name of the output column
timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setTimeout(value)[source]¶

Parameters: timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

mmlspark.io.http.ServingFunctions module¶

mmlspark.io.http.ServingFunctions.request_to_string(c)[source]¶

mmlspark.io.http.ServingFunctions.string_to_response(c)[source]¶

mmlspark.io.http.SimpleHTTPTransformer module¶

class mmlspark.io.http.SimpleHTTPTransformer.SimpleHTTPTransformer(concurrency=1, concurrentTimeout=100.0, errorCol=None, flattenOutputBatches=None, handler=None, inputCol=None, inputParser=None, miniBatcher=None, outputCol=None, outputParser=None, timeout=60.0)[source]¶

Bases: mmlspark.io.http._SimpleHTTPTransformer._SimpleHTTPTransformer

setUrl(value)[source]¶

mmlspark.io.http.StringOutputParser module¶

class mmlspark.io.http.StringOutputParser.StringOutputParser(inputCol=None, outputCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column

getInputCol()[source]¶

Returns: The name of the input column
Return type: str

static getJavaPackage()[source]¶: Returns package name String.

getOutputCol()[source]¶

Returns: The name of the output column
Return type: str

classmethod read()[source]¶: Returns an MLReader instance for this class.

setInputCol(value)[source]¶

Parameters: inputCol (str) – The name of the input column

setOutputCol(value)[source]¶

Parameters: outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None)[source]¶

Set the (keyword only) parameters

Parameters

inputCol (str) – The name of the input column
outputCol (str) – The name of the output column

Module contents¶

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.