synapse.ml.explainers package

Submodules

synapse.ml.explainers.ICETransformer module

class synapse.ml.explainers.ICETransformer.ICETransformer(java_obj=None, categoricalFeatures=[], dependenceNameCol='pdpBasedDependence', featureNameCol='featureNames', kind='individual', model=None, numSamples=None, numericFeatures=[], targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: _ICETransformer

setCategoricalFeatures(values: List[Union[str, Dict]])[source]

Args: values: The list of values that represent categorical features to explain. Values are list of dicts with parameters or just a list of names of categorical features

setNumericFeatures(values: List[Union[str, Dict]])[source]

Args: values: The list of values that represent numeric features to explain. Values are list of dicts with parameters or just a list of names of numeric features

synapse.ml.explainers.ImageLIME module

class synapse.ml.explainers.ImageLIME.ImageLIME(java_obj=None, cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_f9a2a24f33ab__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • cellSize (float) – Number that controls the size of the superpixels

  • inputCol (str) – input column name

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • modifier (float) – Controls the trade-off spatial and color distance

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • samplingFraction (float) – The fraction of superpixels (for image) or tokens (for text) to keep on

  • superpixelCol (str) – The column holding the superpixel decompositions

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')
getCellSize()[source]
Returns:

Number that controls the size of the superpixels

Return type:

cellSize

getInputCol()[source]
Returns:

input column name

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns:

Kernel width. Default value: sqrt (number of features) * 0.75

Return type:

kernelWidth

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getModifier()[source]
Returns:

Controls the trade-off spatial and color distance

Return type:

modifier

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getRegularization()[source]
Returns:

Regularization param for the lasso. Default value: 0.

Return type:

regularization

getSamplingFraction()[source]
Returns:

The fraction of superpixels (for image) or tokens (for text) to keep on

Return type:

samplingFraction

getSuperpixelCol()[source]
Returns:

The column holding the superpixel decompositions

Return type:

superpixelCol

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')
setCellSize(value)[source]
Parameters:

cellSize – Number that controls the size of the superpixels

setInputCol(value)[source]
Parameters:

inputCol – input column name

setKernelWidth(value)[source]
Parameters:

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setModifier(value)[source]
Parameters:

modifier – Controls the trade-off spatial and color distance

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_f9a2a24f33ab__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters:

regularization – Regularization param for the lasso. Default value: 0.

setSamplingFraction(value)[source]
Parameters:

samplingFraction – The fraction of superpixels (for image) or tokens (for text) to keep on

setSuperpixelCol(value)[source]
Parameters:

superpixelCol – The column holding the superpixel decompositions

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')
targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.ImageSHAP module

class synapse.ml.explainers.ImageSHAP.ImageSHAP(java_obj=None, cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_ab53d215a6ac__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • cellSize (float) – Number that controls the size of the superpixels

  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCol (str) – input column name

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • modifier (float) – Controls the trade-off spatial and color distance

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • superpixelCol (str) – The column holding the superpixel decompositions

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')
getCellSize()[source]
Returns:

Number that controls the size of the superpixels

Return type:

cellSize

getInfWeight()[source]
Returns:

The double value to represent infinite weight. Default: 1E8.

Return type:

infWeight

getInputCol()[source]
Returns:

input column name

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getModifier()[source]
Returns:

Controls the trade-off spatial and color distance

Return type:

modifier

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getSuperpixelCol()[source]
Returns:

The column holding the superpixel decompositions

Return type:

superpixelCol

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setCellSize(value)[source]
Parameters:

cellSize – Number that controls the size of the superpixels

setInfWeight(value)[source]
Parameters:

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]
Parameters:

inputCol – input column name

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setModifier(value)[source]
Parameters:

modifier – Controls the trade-off spatial and color distance

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_ab53d215a6ac__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setSuperpixelCol(value)[source]
Parameters:

superpixelCol – The column holding the superpixel decompositions

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')
targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TabularLIME module

class synapse.ml.explainers.TabularLIME.TabularLIME(java_obj=None, backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_766052ca9bc3__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • backgroundData (object) – A dataframe containing background data

  • categoricalFeatures (list) – Name of features that should be treated as categorical variables.

  • inputCols (list) – input column names

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
categoricalFeatures = Param(parent='undefined', name='categoricalFeatures', doc='Name of features that should be treated as categorical variables.')
getBackgroundData()[source]
Returns:

A dataframe containing background data

Return type:

backgroundData

getCategoricalFeatures()[source]
Returns:

Name of features that should be treated as categorical variables.

Return type:

categoricalFeatures

getInputCols()[source]
Returns:

input column names

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns:

Kernel width. Default value: sqrt (number of features) * 0.75

Return type:

kernelWidth

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getRegularization()[source]
Returns:

Regularization param for the lasso. Default value: 0.

Return type:

regularization

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

inputCols = Param(parent='undefined', name='inputCols', doc='input column names')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
setBackgroundData(value)[source]
Parameters:

backgroundData – A dataframe containing background data

setCategoricalFeatures(value)[source]
Parameters:

categoricalFeatures – Name of features that should be treated as categorical variables.

setInputCols(value)[source]
Parameters:

inputCols – input column names

setKernelWidth(value)[source]
Parameters:

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_766052ca9bc3__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters:

regularization – Regularization param for the lasso. Default value: 0.

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TabularSHAP module

class synapse.ml.explainers.TabularSHAP.TabularSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_9f0397fd6d9d__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • backgroundData (object) – A dataframe containing background data

  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCols (list) – input column names

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
getBackgroundData()[source]
Returns:

A dataframe containing background data

Return type:

backgroundData

getInfWeight()[source]
Returns:

The double value to represent infinite weight. Default: 1E8.

Return type:

infWeight

getInputCols()[source]
Returns:

input column names

Return type:

inputCols

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCols = Param(parent='undefined', name='inputCols', doc='input column names')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackgroundData(value)[source]
Parameters:

backgroundData – A dataframe containing background data

setInfWeight(value)[source]
Parameters:

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCols(value)[source]
Parameters:

inputCols – input column names

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_9f0397fd6d9d__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TextLIME module

class synapse.ml.explainers.TextLIME.TextLIME(java_obj=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_8a1533565787__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • inputCol (str) – input column name

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • samplingFraction (float) – The fraction of superpixels (for image) or tokens (for text) to keep on

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

  • tokensCol (str) – The column holding the tokens

getInputCol()[source]
Returns:

input column name

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns:

Kernel width. Default value: sqrt (number of features) * 0.75

Return type:

kernelWidth

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getRegularization()[source]
Returns:

Regularization param for the lasso. Default value: 0.

Return type:

regularization

getSamplingFraction()[source]
Returns:

The fraction of superpixels (for image) or tokens (for text) to keep on

Return type:

samplingFraction

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

getTokensCol()[source]
Returns:

The column holding the tokens

Return type:

tokensCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')
setInputCol(value)[source]
Parameters:

inputCol – input column name

setKernelWidth(value)[source]
Parameters:

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_8a1533565787__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters:

regularization – Regularization param for the lasso. Default value: 0.

setSamplingFraction(value)[source]
Parameters:

samplingFraction – The fraction of superpixels (for image) or tokens (for text) to keep on

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

setTokensCol(value)[source]
Parameters:

tokensCol – The column holding the tokens

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')

synapse.ml.explainers.TextSHAP module

class synapse.ml.explainers.TextSHAP.TextSHAP(java_obj=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_b5668c28490d__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCol (str) – input column name

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

  • tokensCol (str) – The column holding the tokens

getInfWeight()[source]
Returns:

The double value to represent infinite weight. Default: 1E8.

Return type:

infWeight

getInputCol()[source]
Returns:

input column name

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

getTokensCol()[source]
Returns:

The column holding the tokens

Return type:

tokensCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setInfWeight(value)[source]
Parameters:

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]
Parameters:

inputCol – input column name

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_b5668c28490d__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Set the (keyword only) parameters

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

setTokensCol(value)[source]
Parameters:

tokensCol – The column holding the tokens

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')

synapse.ml.explainers.VectorLIME module

class synapse.ml.explainers.VectorLIME.VectorLIME(java_obj=None, backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_41412686fae0__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • backgroundData (object) – A dataframe containing background data

  • inputCol (str) – input column name

  • kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • regularization (float) – Regularization param for the lasso. Default value: 0.

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
getBackgroundData()[source]
Returns:

A dataframe containing background data

Return type:

backgroundData

getInputCol()[source]
Returns:

input column name

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getKernelWidth()[source]
Returns:

Kernel width. Default value: sqrt (number of features) * 0.75

Return type:

kernelWidth

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getRegularization()[source]
Returns:

Regularization param for the lasso. Default value: 0.

Return type:

regularization

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
setBackgroundData(value)[source]
Parameters:

backgroundData – A dataframe containing background data

setInputCol(value)[source]
Parameters:

inputCol – input column name

setKernelWidth(value)[source]
Parameters:

kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_41412686fae0__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setRegularization(value)[source]
Parameters:

regularization – Regularization param for the lasso. Default value: 0.

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.VectorSHAP module

class synapse.ml.explainers.VectorSHAP.VectorSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_119c94f6bc46__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • backgroundData (object) – A dataframe containing background data

  • infWeight (float) – The double value to represent infinite weight. Default: 1E8.

  • inputCol (str) – input column name

  • metricsCol (str) – Column name for fitting metrics

  • model (object) – The model to be interpreted.

  • numSamples (int) – Number of samples to generate.

  • outputCol (str) – output column name

  • targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

  • targetClassesCol (str) – The name of the column that specifies the indices of the classes for multinomial classification models.

  • targetCol (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
getBackgroundData()[source]
Returns:

A dataframe containing background data

Return type:

backgroundData

getInfWeight()[source]
Returns:

The double value to represent infinite weight. Default: 1E8.

Return type:

infWeight

getInputCol()[source]
Returns:

input column name

Return type:

inputCol

static getJavaPackage()[source]

Returns package name String.

getMetricsCol()[source]
Returns:

Column name for fitting metrics

Return type:

metricsCol

getModel()[source]
Returns:

The model to be interpreted.

Return type:

model

getNumSamples()[source]
Returns:

Number of samples to generate.

Return type:

numSamples

getOutputCol()[source]
Returns:

output column name

Return type:

outputCol

getTargetClasses()[source]
Returns:

The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

Return type:

targetClasses

getTargetClassesCol()[source]
Returns:

The name of the column that specifies the indices of the classes for multinomial classification models.

Return type:

targetClassesCol

getTargetCol()[source]
Returns:

The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

Return type:

targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

setBackgroundData(value)[source]
Parameters:

backgroundData – A dataframe containing background data

setInfWeight(value)[source]
Parameters:

infWeight – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]
Parameters:

inputCol – input column name

setMetricsCol(value)[source]
Parameters:

metricsCol – Column name for fitting metrics

setModel(value)[source]
Parameters:

model – The model to be interpreted.

setNumSamples(value)[source]
Parameters:

numSamples – Number of samples to generate.

setOutputCol(value)[source]
Parameters:

outputCol – output column name

setParams(backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_119c94f6bc46__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Set the (keyword only) parameters

setTargetClasses(value)[source]
Parameters:

targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]
Parameters:

targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]
Parameters:

targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.