synapse.ml.explainers package

Submodules

synapse.ml.explainers.ICETransformer module

class synapse.ml.explainers.ICETransformer.ICETransformer(java_obj=None, categoricalFeatures=[], dependenceNameCol='pdpBasedDependence', featureNameCol='featureNames', kind='individual', model=None, numSamples=None, numericFeatures=[], targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

setCategoricalFeatures(values: List[Union[str, Dict]])[source]

Args: values: The list of values that represent categorical features to explain. Values are list of dicts with parameters or just a list of names of categorical features

setNumericFeatures(values: List[Union[str, Dict]])[source]

Args: values: The list of values that represent numeric features to explain. Values are list of dicts with parameters or just a list of names of numeric features

synapse.ml.explainers.ImageLIME module

class synapse.ml.explainers.ImageLIME.ImageLIME(java_obj=None, cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_718a5d3f0dae__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • cellSize (float) – Number that controls the size of the superpixels

  • inputCol (str) – input column name

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • modifier (float) – Controls the trade-off spatial and color distance

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • samplingFraction (float) – The fraction of superpixels (for image) or tokens (for text) to keep on

  • superpixelCol (str) – The column holding the superpixel decompositions

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')
getCellSize()[source]
Returns

Number that controls the size of the superpixels

Return type

cellSize

getInputCol()[source]
Returns

input column name

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns

Kernel width. Default value: sqrt (number of features) * 0.75

Return type

kernelWidth

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getModifier()[source]
Returns

Controls the trade-off spatial and color distance

Return type

modifier

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getRegularization()[source]
Returns

Regularization param for the lasso. Default value: 0.

Return type

regularization

getSamplingFraction()[source]
Returns

The fraction of superpixels (for image) or tokens (for text) to keep on

Return type

samplingFraction

getSuperpixelCol()[source]
Returns

The column holding the superpixel decompositions

Return type

superpixelCol

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')
setCellSize(value)[source]
Parameters

cellSize – Number that controls the size of the superpixels

setInputCol(value)[source]
Parameters

inputCol – input column name

setKernelWidth(value)[source]
Parameters

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setModifier(value)[source]
Parameters

modifier – Controls the trade-off spatial and color distance

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_718a5d3f0dae__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters

regularization – Regularization param for the lasso. Default value: 0.

setSamplingFraction(value)[source]
Parameters

samplingFraction – The fraction of superpixels (for image) or tokens (for text) to keep on

setSuperpixelCol(value)[source]
Parameters

superpixelCol – The column holding the superpixel decompositions

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')
targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.ImageSHAP module

class synapse.ml.explainers.ImageSHAP.ImageSHAP(java_obj=None, cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_0616a2d19ce1__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • cellSize (float) – Number that controls the size of the superpixels

  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCol (str) – input column name

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • modifier (float) – Controls the trade-off spatial and color distance

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • superpixelCol (str) – The column holding the superpixel decompositions

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')
getCellSize()[source]
Returns

Number that controls the size of the superpixels

Return type

cellSize

getInfWeight()[source]
Returns

The double value to represent infinite weight. Default: 1E8.

Return type

infWeight

getInputCol()[source]
Returns

input column name

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getModifier()[source]
Returns

Controls the trade-off spatial and color distance

Return type

modifier

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getSuperpixelCol()[source]
Returns

The column holding the superpixel decompositions

Return type

superpixelCol

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setCellSize(value)[source]
Parameters

cellSize – Number that controls the size of the superpixels

setInfWeight(value)[source]
Parameters

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]
Parameters

inputCol – input column name

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setModifier(value)[source]
Parameters

modifier – Controls the trade-off spatial and color distance

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_0616a2d19ce1__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setSuperpixelCol(value)[source]
Parameters

superpixelCol – The column holding the superpixel decompositions

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')
targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TabularLIME module

class synapse.ml.explainers.TabularLIME.TabularLIME(java_obj=None, backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_6550df7dc453__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • backgroundData (object) – A dataframe containing background data

  • categoricalFeatures (list) – Name of features that should be treated as categorical variables.

  • inputCols (list) – input column names

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
categoricalFeatures = Param(parent='undefined', name='categoricalFeatures', doc='Name of features that should be treated as categorical variables.')
getBackgroundData()[source]
Returns

A dataframe containing background data

Return type

backgroundData

getCategoricalFeatures()[source]
Returns

Name of features that should be treated as categorical variables.

Return type

categoricalFeatures

getInputCols()[source]
Returns

input column names

Return type

inputCols

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns

Kernel width. Default value: sqrt (number of features) * 0.75

Return type

kernelWidth

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getRegularization()[source]
Returns

Regularization param for the lasso. Default value: 0.

Return type

regularization

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

inputCols = Param(parent='undefined', name='inputCols', doc='input column names')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
setBackgroundData(value)[source]
Parameters

backgroundData – A dataframe containing background data

setCategoricalFeatures(value)[source]
Parameters

categoricalFeatures – Name of features that should be treated as categorical variables.

setInputCols(value)[source]
Parameters

inputCols – input column names

setKernelWidth(value)[source]
Parameters

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_6550df7dc453__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters

regularization – Regularization param for the lasso. Default value: 0.

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TabularSHAP module

class synapse.ml.explainers.TabularSHAP.TabularSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_ec97671ef1af__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • backgroundData (object) – A dataframe containing background data

  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCols (list) – input column names

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
getBackgroundData()[source]
Returns

A dataframe containing background data

Return type

backgroundData

getInfWeight()[source]
Returns

The double value to represent infinite weight. Default: 1E8.

Return type

infWeight

getInputCols()[source]
Returns

input column names

Return type

inputCols

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCols = Param(parent='undefined', name='inputCols', doc='input column names')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackgroundData(value)[source]
Parameters

backgroundData – A dataframe containing background data

setInfWeight(value)[source]
Parameters

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCols(value)[source]
Parameters

inputCols – input column names

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_ec97671ef1af__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TextLIME module

class synapse.ml.explainers.TextLIME.TextLIME(java_obj=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_ddd07fca26c6__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • inputCol (str) – input column name

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • samplingFraction (float) – The fraction of superpixels (for image) or tokens (for text) to keep on

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

  • tokensCol (str) – The column holding the tokens

getInputCol()[source]
Returns

input column name

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns

Kernel width. Default value: sqrt (number of features) * 0.75

Return type

kernelWidth

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getRegularization()[source]
Returns

Regularization param for the lasso. Default value: 0.

Return type

regularization

getSamplingFraction()[source]
Returns

The fraction of superpixels (for image) or tokens (for text) to keep on

Return type

samplingFraction

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

getTokensCol()[source]
Returns

The column holding the tokens

Return type

tokensCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')
setInputCol(value)[source]
Parameters

inputCol – input column name

setKernelWidth(value)[source]
Parameters

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_ddd07fca26c6__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters

regularization – Regularization param for the lasso. Default value: 0.

setSamplingFraction(value)[source]
Parameters

samplingFraction – The fraction of superpixels (for image) or tokens (for text) to keep on

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

setTokensCol(value)[source]
Parameters

tokensCol – The column holding the tokens

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')

synapse.ml.explainers.TextSHAP module

class synapse.ml.explainers.TextSHAP.TextSHAP(java_obj=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_d273705d110f__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCol (str) – input column name

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

  • tokensCol (str) – The column holding the tokens

getInfWeight()[source]
Returns

The double value to represent infinite weight. Default: 1E8.

Return type

infWeight

getInputCol()[source]
Returns

input column name

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

getTokensCol()[source]
Returns

The column holding the tokens

Return type

tokensCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setInfWeight(value)[source]
Parameters

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]
Parameters

inputCol – input column name

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_d273705d110f__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Set the (keyword only) parameters

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

setTokensCol(value)[source]
Parameters

tokensCol – The column holding the tokens

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')

synapse.ml.explainers.VectorLIME module

class synapse.ml.explainers.VectorLIME.VectorLIME(java_obj=None, backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_0305a2eedfb9__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • backgroundData (object) – A dataframe containing background data

  • inputCol (str) – input column name

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
getBackgroundData()[source]
Returns

A dataframe containing background data

Return type

backgroundData

getInputCol()[source]
Returns

input column name

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns

Kernel width. Default value: sqrt (number of features) * 0.75

Return type

kernelWidth

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getRegularization()[source]
Returns

Regularization param for the lasso. Default value: 0.

Return type

regularization

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
setBackgroundData(value)[source]
Parameters

backgroundData – A dataframe containing background data

setInputCol(value)[source]
Parameters

inputCol – input column name

setKernelWidth(value)[source]
Parameters

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_0305a2eedfb9__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters

regularization – Regularization param for the lasso. Default value: 0.

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.VectorSHAP module

class synapse.ml.explainers.VectorSHAP.VectorSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_d4f4a0dba4fc__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • backgroundData (object) – A dataframe containing background data

  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCol (str) – input column name

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
getBackgroundData()[source]
Returns

A dataframe containing background data

Return type

backgroundData

getInfWeight()[source]
Returns

The double value to represent infinite weight. Default: 1E8.

Return type

infWeight

getInputCol()[source]
Returns

input column name

Return type

inputCol

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns

Column name for fitting metrics

Return type

metricsCol

getModel()[source]
Returns

The model to be interpreted.

Return type

model

getNumSamples()[source]
Returns

Number of samples to generate.

Return type

numSamples

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getTargetClasses()[source]
Returns

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type

targetClasses

getTargetClassesCol()[source]
Returns

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type

targetClassesCol

getTargetCol()[source]
Returns

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type

targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackgroundData(value)[source]
Parameters

backgroundData – A dataframe containing background data

setInfWeight(value)[source]
Parameters

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]
Parameters

inputCol – input column name

setMetricsCol(value)[source]
Parameters

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_d4f4a0dba4fc__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setTargetClasses(value)[source]
Parameters

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.