mmlspark.io.http package

Submodules

mmlspark.io.http.CustomInputParser module

class mmlspark.io.http.CustomInputParser.CustomInputParser(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • udfPython (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getUdfPython()[source]
Returns

User Defined Python Function to be applied to the DF input col

Return type

object

getUdfScala()[source]
Returns

User Defined Function to be applied to the DF input col

Return type

object

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Set the (keyword only) parameters

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • udfPython (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

setUdfPython(value)[source]
Parameters

udfPython (object) – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]
Parameters

udfScala (object) – User Defined Function to be applied to the DF input col

mmlspark.io.http.CustomOutputParser module

class mmlspark.io.http.CustomOutputParser.CustomOutputParser(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • udfPython (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getUdfPython()[source]
Returns

User Defined Python Function to be applied to the DF input col

Return type

object

getUdfScala()[source]
Returns

User Defined Function to be applied to the DF input col

Return type

object

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None, udfPython=None, udfScala=None)[source]

Set the (keyword only) parameters

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • udfPython (object) – User Defined Python Function to be applied to the DF input col

  • udfScala (object) – User Defined Function to be applied to the DF input col

setUdfPython(value)[source]
Parameters

udfPython (object) – User Defined Python Function to be applied to the DF input col

setUdfScala(value)[source]
Parameters

udfScala (object) – User Defined Function to be applied to the DF input col

mmlspark.io.http.HTTPTransformer module

class mmlspark.io.http.HTTPTransformer.HTTPTransformer(concurrency=1, concurrentTimeout=100.0, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(concurrency=1, concurrentTimeout=100.0, handler=None, inputCol=None, outputCol=None, timeout=60.0)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

mmlspark.io.http.JSONInputParser module

class mmlspark.io.http.JSONInputParser.JSONInputParser(headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • headers (dict) – headers of the request (default: Map())

  • inputCol (str) – The name of the input column

  • method (str) – method to use for request, (PUT, POST, PATCH) (default: POST)

  • outputCol (str) – The name of the output column

  • url (str) – Url of the service

getHeaders()[source]
Returns

headers of the request (default: Map())

Return type

dict

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getMethod()[source]
Returns

method to use for request, (PUT, POST, PATCH) (default: POST)

Return type

str

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setHeaders(value)[source]
Parameters

headers (dict) – headers of the request (default: Map())

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setMethod(value)[source]
Parameters

method (str) – method to use for request, (PUT, POST, PATCH) (default: POST)

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(headers={}, inputCol=None, method='POST', outputCol=None, url=None)[source]

Set the (keyword only) parameters

Parameters
  • headers (dict) – headers of the request (default: Map())

  • inputCol (str) – The name of the input column

  • method (str) – method to use for request, (PUT, POST, PATCH) (default: POST)

  • outputCol (str) – The name of the output column

  • url (str) – Url of the service

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.io.http.JSONOutputParser module

class mmlspark.io.http.JSONOutputParser.JSONOutputParser(dataType=None, inputCol=None, outputCol=None, postProcessor=None)[source]

Bases: mmlspark.io.http._JSONOutputParser._JSONOutputParser

getDataType()[source]
Returns

format to parse the column to

Return type

object

setDataType(value)[source]
Parameters

dataType (object) – format to parse the column to

mmlspark.io.http.PartitionConsolidator module

class mmlspark.io.http.PartitionConsolidator.PartitionConsolidator(concurrency=1, concurrentTimeout=100.0, inputCol=None, outputCol=None, timeout=60.0)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

str

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(concurrency=1, concurrentTimeout=100.0, inputCol=None, outputCol=None, timeout=60.0)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

mmlspark.io.http.ServingFunctions module

mmlspark.io.http.ServingFunctions.request_to_string(c)[source]
mmlspark.io.http.ServingFunctions.string_to_response(c)[source]

mmlspark.io.http.SimpleHTTPTransformer module

class mmlspark.io.http.SimpleHTTPTransformer.SimpleHTTPTransformer(concurrency=1, concurrentTimeout=100.0, errorCol=None, flattenOutputBatches=None, handler=None, inputCol=None, inputParser=None, miniBatcher=None, outputCol=None, outputParser=None, timeout=60.0)[source]

Bases: mmlspark.io.http._SimpleHTTPTransformer._SimpleHTTPTransformer

setUrl(value)[source]

mmlspark.io.http.StringOutputParser module

class mmlspark.io.http.StringOutputParser.StringOutputParser(inputCol=None, outputCol=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

getInputCol()[source]
Returns

The name of the input column

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setInputCol(value)[source]
Parameters

inputCol (str) – The name of the input column

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column

setParams(inputCol=None, outputCol=None)[source]

Set the (keyword only) parameters

Parameters
  • inputCol (str) – The name of the input column

  • outputCol (str) – The name of the output column

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.