mmlspark.lime package¶

Submodules¶

mmlspark.lime.ImageLIME module¶

class mmlspark.lime.ImageLIME.ImageLIME(java_obj=None, cellSize=16.0, inputCol=None, model=None, modifier=130.0, nSamples=900, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, superpixelCol='superpixels')[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

cellSize (float) – Number that controls the size of the superpixels
inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
modifier (float) – Controls the trade-off spatial and color distance
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on
superpixelCol (object) – The column holding the superpixel decompositions

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')¶

getCellSize()[source]¶

Returns: Number that controls the size of the superpixels
Return type: cellSize

getInputCol()[source]¶

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]¶: Returns package name String.

getModel()[source]¶

Returns: Model to try to locally approximate
Return type: model

getModifier()[source]¶

Returns: Controls the trade-off spatial and color distance
Return type: modifier

getNSamples()[source]¶

Returns: The number of samples to generate
Return type: nSamples

getOutputCol()[source]¶

Returns: The name of the output column
Return type: outputCol

getPredictionCol()[source]¶

Returns: prediction column name
Return type: predictionCol

getRegularization()[source]¶

Returns: regularization param for the lasso
Return type: regularization

getSamplingFraction()[source]¶

Returns: The fraction of superpixels to keep on
Return type: samplingFraction

getSuperpixelCol()[source]¶

Returns: The column holding the superpixel decompositions
Return type: superpixelCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶

model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶

modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')¶

nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶

predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶

samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶

setCellSize(value)[source]¶

Parameters: cellSize – Number that controls the size of the superpixels

setInputCol(value)[source]¶

Parameters: inputCol – The name of the input column

setModel(value)[source]¶

Parameters: model – Model to try to locally approximate

setModifier(value)[source]¶

Parameters: modifier – Controls the trade-off spatial and color distance

setNSamples(value)[source]¶

Parameters: nSamples – The number of samples to generate

setOutputCol(value)[source]¶

Parameters: outputCol – The name of the output column

setParams(cellSize=16.0, inputCol=None, model=None, modifier=130.0, nSamples=900, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, superpixelCol='superpixels')[source]¶: Set the (keyword only) parameters

setPredictionCol(value)[source]¶

Parameters: predictionCol – prediction column name

setRegularization(value)[source]¶

Parameters: regularization – regularization param for the lasso

setSamplingFraction(value)[source]¶

Parameters: samplingFraction – The fraction of superpixels to keep on

setSuperpixelCol(value)[source]¶

Parameters: superpixelCol – The column holding the superpixel decompositions

superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')¶

mmlspark.lime.SuperpixelTransformer module¶

class mmlspark.lime.SuperpixelTransformer.SuperpixelTransformer(java_obj=None, cellSize=16.0, inputCol=None, modifier=130.0, outputCol='SuperpixelTransformer_9cc5024ff02d_output')[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters

cellSize (float) – Number that controls the size of the superpixels
inputCol (object) – The name of the input column
modifier (float) – Controls the trade-off spatial and color distance
outputCol (object) – The name of the output column

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')¶

getCellSize()[source]¶

Returns: Number that controls the size of the superpixels
Return type: cellSize

getInputCol()[source]¶

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]¶: Returns package name String.

getModifier()[source]¶

Returns: Controls the trade-off spatial and color distance
Return type: modifier

getOutputCol()[source]¶

Returns: The name of the output column
Return type: outputCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶

modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')¶

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

setCellSize(value)[source]¶

Parameters: cellSize – Number that controls the size of the superpixels

setInputCol(value)[source]¶

Parameters: inputCol – The name of the input column

setModifier(value)[source]¶

Parameters: modifier – Controls the trade-off spatial and color distance

setOutputCol(value)[source]¶

Parameters: outputCol – The name of the output column

setParams(cellSize=16.0, inputCol=None, modifier=130.0, outputCol='SuperpixelTransformer_9cc5024ff02d_output')[source]¶: Set the (keyword only) parameters

mmlspark.lime.TabularLIME module¶

class mmlspark.lime.TabularLIME.TabularLIME(java_obj=None, inputCol=None, model=None, nSamples=1000, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters

inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on

getInputCol()[source]¶

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]¶: Returns package name String.

getModel()[source]¶

Returns: Model to try to locally approximate
Return type: model

getNSamples()[source]¶

Returns: The number of samples to generate
Return type: nSamples

getOutputCol()[source]¶

Returns: The name of the output column
Return type: outputCol

getPredictionCol()[source]¶

Returns: prediction column name
Return type: predictionCol

getRegularization()[source]¶

Returns: regularization param for the lasso
Return type: regularization

getSamplingFraction()[source]¶

Returns: The fraction of superpixels to keep on
Return type: samplingFraction

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶

model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶

nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶

predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶

samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶

setInputCol(value)[source]¶

Parameters: inputCol – The name of the input column

setModel(value)[source]¶

Parameters: model – Model to try to locally approximate

setNSamples(value)[source]¶

Parameters: nSamples – The number of samples to generate

setOutputCol(value)[source]¶

Parameters: outputCol – The name of the output column

setParams(inputCol=None, model=None, nSamples=1000, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3)[source]¶: Set the (keyword only) parameters

setPredictionCol(value)[source]¶

Parameters: predictionCol – prediction column name

setRegularization(value)[source]¶

Parameters: regularization – regularization param for the lasso

setSamplingFraction(value)[source]¶

Parameters: samplingFraction – The fraction of superpixels to keep on

mmlspark.lime.TabularLIMEModel module¶

class mmlspark.lime.TabularLIMEModel.TabularLIMEModel(java_obj=None, columnSTDs=None, inputCol=None, model=None, nSamples=None, outputCol=None, predictionCol='prediction', regularization=None, samplingFraction=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters

columnSTDs (list) – the standard deviations of each of the columns for perturbation
inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on

columnSTDs = Param(parent='undefined', name='columnSTDs', doc='the standard deviations of each of the columns for perturbation')¶

getColumnSTDs()[source]¶

Returns: the standard deviations of each of the columns for perturbation
Return type: columnSTDs

getInputCol()[source]¶

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]¶: Returns package name String.

getModel()[source]¶

Returns: Model to try to locally approximate
Return type: model

getNSamples()[source]¶

Returns: The number of samples to generate
Return type: nSamples

getOutputCol()[source]¶

Returns: The name of the output column
Return type: outputCol

getPredictionCol()[source]¶

Returns: prediction column name
Return type: predictionCol

getRegularization()[source]¶

Returns: regularization param for the lasso
Return type: regularization

getSamplingFraction()[source]¶

Returns: The fraction of superpixels to keep on
Return type: samplingFraction

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶

model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶

nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶

predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶

samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶

setColumnSTDs(value)[source]¶

Parameters: columnSTDs – the standard deviations of each of the columns for perturbation

setInputCol(value)[source]¶

Parameters: inputCol – The name of the input column

setModel(value)[source]¶

Parameters: model – Model to try to locally approximate

setNSamples(value)[source]¶

Parameters: nSamples – The number of samples to generate

setOutputCol(value)[source]¶

Parameters: outputCol – The name of the output column

setParams(columnSTDs=None, inputCol=None, model=None, nSamples=None, outputCol=None, predictionCol='prediction', regularization=None, samplingFraction=None)[source]¶: Set the (keyword only) parameters

setPredictionCol(value)[source]¶

Parameters: predictionCol – prediction column name

setRegularization(value)[source]¶

Parameters: regularization – regularization param for the lasso

setSamplingFraction(value)[source]¶

Parameters: samplingFraction – The fraction of superpixels to keep on

mmlspark.lime.TextLIME module¶

class mmlspark.lime.TextLIME.TextLIME(java_obj=None, inputCol=None, model=None, nSamples=1000, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, tokenCol=None)[source]¶

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters

inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on
tokenCol (object) – The column holding the token

getInputCol()[source]¶

Returns: The name of the input column
Return type: inputCol

static getJavaPackage()[source]¶: Returns package name String.

getModel()[source]¶

Returns: Model to try to locally approximate
Return type: model

getNSamples()[source]¶

Returns: The number of samples to generate
Return type: nSamples

getOutputCol()[source]¶

Returns: The name of the output column
Return type: outputCol

getPredictionCol()[source]¶

Returns: prediction column name
Return type: predictionCol

getRegularization()[source]¶

Returns: regularization param for the lasso
Return type: regularization

getSamplingFraction()[source]¶

Returns: The fraction of superpixels to keep on
Return type: samplingFraction

getTokenCol()[source]¶

Returns: The column holding the token
Return type: tokenCol

inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶

model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶

nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶

outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶

predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶

classmethod read()[source]¶: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶

samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶

setInputCol(value)[source]¶

Parameters: inputCol – The name of the input column

setModel(value)[source]¶

Parameters: model – Model to try to locally approximate

setNSamples(value)[source]¶

Parameters: nSamples – The number of samples to generate

setOutputCol(value)[source]¶

Parameters: outputCol – The name of the output column

setParams(inputCol=None, model=None, nSamples=1000, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, tokenCol=None)[source]¶: Set the (keyword only) parameters

setPredictionCol(value)[source]¶

Parameters: predictionCol – prediction column name

setRegularization(value)[source]¶

Parameters: regularization – regularization param for the lasso

setSamplingFraction(value)[source]¶

Parameters: samplingFraction – The fraction of superpixels to keep on

setTokenCol(value)[source]¶

Parameters: tokenCol – The column holding the token

tokenCol = Param(parent='undefined', name='tokenCol', doc='The column holding the token')¶

Module contents¶

MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. MMLSpark adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

MMLSpark also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, MMLSpark provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

MMLSpark requires Scala 2.11, Spark 2.4+, and Python 3.5+.