mmlspark.lime package¶
Submodules¶
mmlspark.lime.ImageLIME module¶
- class mmlspark.lime.ImageLIME.ImageLIME(java_obj=None, cellSize=16.0, inputCol=None, model=None, modifier=130.0, nSamples=900, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, superpixelCol='superpixels')[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
cellSize (float) – Number that controls the size of the superpixels
inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
modifier (float) – Controls the trade-off spatial and color distance
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on
superpixelCol (object) – The column holding the superpixel decompositions
- cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')¶
- getCellSize()[source]¶
- Returns
Number that controls the size of the superpixels
- Return type
cellSize
- getModifier()[source]¶
- Returns
Controls the trade-off spatial and color distance
- Return type
modifier
- getSamplingFraction()[source]¶
- Returns
The fraction of superpixels to keep on
- Return type
samplingFraction
- getSuperpixelCol()[source]¶
- Returns
The column holding the superpixel decompositions
- Return type
superpixelCol
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶
- model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶
- modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')¶
- nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶
- regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶
- samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶
- setParams(cellSize=16.0, inputCol=None, model=None, modifier=130.0, nSamples=900, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, superpixelCol='superpixels')[source]¶
Set the (keyword only) parameters
- setSamplingFraction(value)[source]¶
- Parameters
samplingFraction – The fraction of superpixels to keep on
- setSuperpixelCol(value)[source]¶
- Parameters
superpixelCol – The column holding the superpixel decompositions
- superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')¶
mmlspark.lime.SuperpixelTransformer module¶
- class mmlspark.lime.SuperpixelTransformer.SuperpixelTransformer(java_obj=None, cellSize=16.0, inputCol=None, modifier=130.0, outputCol='SuperpixelTransformer_9cc5024ff02d_output')[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
- cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')¶
- getCellSize()[source]¶
- Returns
Number that controls the size of the superpixels
- Return type
cellSize
- getModifier()[source]¶
- Returns
Controls the trade-off spatial and color distance
- Return type
modifier
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶
- modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')¶
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶
mmlspark.lime.TabularLIME module¶
- class mmlspark.lime.TabularLIME.TabularLIME(java_obj=None, inputCol=None, model=None, nSamples=1000, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on
- getSamplingFraction()[source]¶
- Returns
The fraction of superpixels to keep on
- Return type
samplingFraction
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶
- model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶
- nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶
- regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶
- samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶
mmlspark.lime.TabularLIMEModel module¶
- class mmlspark.lime.TabularLIMEModel.TabularLIMEModel(java_obj=None, columnSTDs=None, inputCol=None, model=None, nSamples=None, outputCol=None, predictionCol='prediction', regularization=None, samplingFraction=None)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaModel
- Parameters
columnSTDs (list) – the standard deviations of each of the columns for perturbation
inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on
- columnSTDs = Param(parent='undefined', name='columnSTDs', doc='the standard deviations of each of the columns for perturbation')¶
- getColumnSTDs()[source]¶
- Returns
the standard deviations of each of the columns for perturbation
- Return type
columnSTDs
- getSamplingFraction()[source]¶
- Returns
The fraction of superpixels to keep on
- Return type
samplingFraction
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶
- model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶
- nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶
- regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶
- samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶
- setColumnSTDs(value)[source]¶
- Parameters
columnSTDs – the standard deviations of each of the columns for perturbation
mmlspark.lime.TextLIME module¶
- class mmlspark.lime.TextLIME.TextLIME(java_obj=None, inputCol=None, model=None, nSamples=1000, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, tokenCol=None)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaModel
- Parameters
inputCol (object) – The name of the input column
model (object) – Model to try to locally approximate
nSamples (int) – The number of samples to generate
outputCol (object) – The name of the output column
predictionCol (object) – prediction column name
regularization (float) – regularization param for the lasso
samplingFraction (float) – The fraction of superpixels to keep on
tokenCol (object) – The column holding the token
- getSamplingFraction()[source]¶
- Returns
The fraction of superpixels to keep on
- Return type
samplingFraction
- inputCol = Param(parent='undefined', name='inputCol', doc='The name of the input column')¶
- model = Param(parent='undefined', name='model', doc='Model to try to locally approximate')¶
- nSamples = Param(parent='undefined', name='nSamples', doc='The number of samples to generate')¶
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')¶
- predictionCol = Param(parent='undefined', name='predictionCol', doc='prediction column name')¶
- regularization = Param(parent='undefined', name='regularization', doc='regularization param for the lasso')¶
- samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels to keep on')¶
- setParams(inputCol=None, model=None, nSamples=1000, outputCol=None, predictionCol='prediction', regularization=0.0, samplingFraction=0.3, tokenCol=None)[source]¶
Set the (keyword only) parameters
- setSamplingFraction(value)[source]¶
- Parameters
samplingFraction – The fraction of superpixels to keep on
- tokenCol = Param(parent='undefined', name='tokenCol', doc='The column holding the token')¶
Module contents¶
MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. MMLSpark adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
MMLSpark also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, MMLSpark provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
MMLSpark requires Scala 2.11, Spark 2.4+, and Python 3.5+.