synapse.ml.explainers package

Submodules

synapse.ml.explainers.ICETransformer module

class synapse.ml.explainers.ICETransformer.ICETransformer(java_obj=None, categoricalFeatures=[], dependenceNameCol='pdpBasedDependence', featureNameCol='featureNames', kind='individual', model=None, numSamples=None, numericFeatures=[], targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

setCategoricalFeatures(values: List[Union[str, Dict]])[source]: Args: values: The list of values that represent categorical features to explain. Values are list of dicts with parameters or just a list of names of categorical features

setNumericFeatures(values: List[Union[str, Dict]])[source]: Args: values: The list of values that represent numeric features to explain. Values are list of dicts with parameters or just a list of names of numeric features

synapse.ml.explainers.ImageLIME module

class synapse.ml.explainers.ImageLIME.ImageLIME(java_obj=None, cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_072e01c82a35__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

cellSize¶ (float) – Number that controls the size of the superpixels
inputCol¶ (str) – input column name
kernelWidth¶ (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
modifier¶ (float) – Controls the trade-off spatial and color distance
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
regularization¶ (float) – Regularization param for the lasso. Default value: 0.
samplingFraction¶ (float) – The fraction of superpixels (for image) or tokens (for text) to keep on
superpixelCol¶ (str) – The column holding the superpixel decompositions
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')

getCellSize()[source]

Returns: Number that controls the size of the superpixels
Return type: cellSize

getInputCol()[source]

Returns: input column name
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getKernelWidth()[source]

Returns: Kernel width. Default value: sqrt (number of features) * 0.75
Return type: kernelWidth

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getModifier()[source]

Returns: Controls the trade-off spatial and color distance
Return type: modifier

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getRegularization()[source]

Returns: Regularization param for the lasso. Default value: 0.
Return type: regularization

getSamplingFraction()[source]

Returns: The fraction of superpixels (for image) or tokens (for text) to keep on
Return type: samplingFraction

getSuperpixelCol()[source]

Returns: The column holding the superpixel decompositions
Return type: superpixelCol

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')

kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')

samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')

setCellSize(value)[source]

Parameters: cellSize¶ – Number that controls the size of the superpixels

setInputCol(value)[source]

Parameters: inputCol¶ – input column name

setKernelWidth(value)[source]

Parameters: kernelWidth¶ – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setModifier(value)[source]

Parameters: modifier¶ – Controls the trade-off spatial and color distance

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_072e01c82a35__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]: Set the (keyword only) parameters

setRegularization(value)[source]

Parameters: regularization¶ – Regularization param for the lasso. Default value: 0.

setSamplingFraction(value)[source]

Parameters: samplingFraction¶ – The fraction of superpixels (for image) or tokens (for text) to keep on

setSuperpixelCol(value)[source]

Parameters: superpixelCol¶ – The column holding the superpixel decompositions

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.ImageSHAP module

class synapse.ml.explainers.ImageSHAP.ImageSHAP(java_obj=None, cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_c0cba83156e7__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

cellSize¶ (float) – Number that controls the size of the superpixels
infWeight¶ (float) – The double value to represent infinite weight. Default: 1E8.
inputCol¶ (str) – input column name
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
modifier¶ (float) – Controls the trade-off spatial and color distance
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
superpixelCol¶ (str) – The column holding the superpixel decompositions
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')

getCellSize()[source]

Returns: Number that controls the size of the superpixels
Return type: cellSize

getInfWeight()[source]

Returns: The double value to represent infinite weight. Default: 1E8.
Return type: infWeight

getInputCol()[source]

Returns: input column name
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getModifier()[source]

Returns: Controls the trade-off spatial and color distance
Return type: modifier

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getSuperpixelCol()[source]

Returns: The column holding the superpixel decompositions
Return type: superpixelCol

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

setCellSize(value)[source]

Parameters: cellSize¶ – Number that controls the size of the superpixels

setInfWeight(value)[source]

Parameters: infWeight¶ – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]

Parameters: inputCol¶ – input column name

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setModifier(value)[source]

Parameters: modifier¶ – Controls the trade-off spatial and color distance

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_c0cba83156e7__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]: Set the (keyword only) parameters

setSuperpixelCol(value)[source]

Parameters: superpixelCol¶ – The column holding the superpixel decompositions

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TabularLIME module

class synapse.ml.explainers.TabularLIME.TabularLIME(java_obj=None, backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_9462a6f58ded__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

backgroundData¶ (object) – A dataframe containing background data
categoricalFeatures¶ (list) – Name of features that should be treated as categorical variables.
inputCols¶ (list) – input column names
kernelWidth¶ (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
regularization¶ (float) – Regularization param for the lasso. Default value: 0.
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')

categoricalFeatures = Param(parent='undefined', name='categoricalFeatures', doc='Name of features that should be treated as categorical variables.')

getBackgroundData()[source]

Returns: A dataframe containing background data
Return type: backgroundData

getCategoricalFeatures()[source]

Returns: Name of features that should be treated as categorical variables.
Return type: categoricalFeatures

getInputCols()[source]

Returns: input column names
Return type: inputCols

static getJavaPackage()[source]: Returns package name String.

getKernelWidth()[source]

Returns: Kernel width. Default value: sqrt (number of features) * 0.75
Return type: kernelWidth

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getRegularization()[source]

Returns: Regularization param for the lasso. Default value: 0.
Return type: regularization

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

inputCols = Param(parent='undefined', name='inputCols', doc='input column names')

kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')

setBackgroundData(value)[source]

Parameters: backgroundData¶ – A dataframe containing background data

setCategoricalFeatures(value)[source]

Parameters: categoricalFeatures¶ – Name of features that should be treated as categorical variables.

setInputCols(value)[source]

Parameters: inputCols¶ – input column names

setKernelWidth(value)[source]

Parameters: kernelWidth¶ – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_9462a6f58ded__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]: Set the (keyword only) parameters

setRegularization(value)[source]

Parameters: regularization¶ – Regularization param for the lasso. Default value: 0.

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TabularSHAP module

class synapse.ml.explainers.TabularSHAP.TabularSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_c0ef0c193a2e__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

backgroundData¶ (object) – A dataframe containing background data
infWeight¶ (float) – The double value to represent infinite weight. Default: 1E8.
inputCols¶ (list) – input column names
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')

getBackgroundData()[source]

Returns: A dataframe containing background data
Return type: backgroundData

getInfWeight()[source]

Returns: The double value to represent infinite weight. Default: 1E8.
Return type: infWeight

getInputCols()[source]

Returns: input column names
Return type: inputCols

static getJavaPackage()[source]: Returns package name String.

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')

inputCols = Param(parent='undefined', name='inputCols', doc='input column names')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

setBackgroundData(value)[source]

Parameters: backgroundData¶ – A dataframe containing background data

setInfWeight(value)[source]

Parameters: infWeight¶ – The double value to represent infinite weight. Default: 1E8.

setInputCols(value)[source]

Parameters: inputCols¶ – input column names

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_c0ef0c193a2e__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]: Set the (keyword only) parameters

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.TextLIME module

class synapse.ml.explainers.TextLIME.TextLIME(java_obj=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_4c47af2f8bf4__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

inputCol¶ (str) – input column name
kernelWidth¶ (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
regularization¶ (float) – Regularization param for the lasso. Default value: 0.
samplingFraction¶ (float) – The fraction of superpixels (for image) or tokens (for text) to keep on
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
tokensCol¶ (str) – The column holding the tokens

getInputCol()[source]

Returns: input column name
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getKernelWidth()[source]

Returns: Kernel width. Default value: sqrt (number of features) * 0.75
Return type: kernelWidth

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getRegularization()[source]

Returns: Regularization param for the lasso. Default value: 0.
Return type: regularization

getSamplingFraction()[source]

Returns: The fraction of superpixels (for image) or tokens (for text) to keep on
Return type: samplingFraction

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

getTokensCol()[source]

Returns: The column holding the tokens
Return type: tokensCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')

kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')

samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')

setInputCol(value)[source]

Parameters: inputCol¶ – input column name

setKernelWidth(value)[source]

Parameters: kernelWidth¶ – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_4c47af2f8bf4__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]: Set the (keyword only) parameters

setRegularization(value)[source]

Parameters: regularization¶ – Regularization param for the lasso. Default value: 0.

setSamplingFraction(value)[source]

Parameters: samplingFraction¶ – The fraction of superpixels (for image) or tokens (for text) to keep on

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

setTokensCol(value)[source]

Parameters: tokensCol¶ – The column holding the tokens

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')

synapse.ml.explainers.TextSHAP module

class synapse.ml.explainers.TextSHAP.TextSHAP(java_obj=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_4b35a9bdcaea__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

infWeight¶ (float) – The double value to represent infinite weight. Default: 1E8.
inputCol¶ (str) – input column name
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
tokensCol¶ (str) – The column holding the tokens

getInfWeight()[source]

Returns: The double value to represent infinite weight. Default: 1E8.
Return type: infWeight

getInputCol()[source]

Returns: input column name
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

getTokensCol()[source]

Returns: The column holding the tokens
Return type: tokensCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

setInfWeight(value)[source]

Parameters: infWeight¶ – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]

Parameters: inputCol¶ – input column name

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_4b35a9bdcaea__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]: Set the (keyword only) parameters

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

setTokensCol(value)[source]

Parameters: tokensCol¶ – The column holding the tokens

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')

synapse.ml.explainers.VectorLIME module

class synapse.ml.explainers.VectorLIME.VectorLIME(java_obj=None, backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_9f02d3dcb6ab__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

backgroundData¶ (object) – A dataframe containing background data
inputCol¶ (str) – input column name
kernelWidth¶ (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
regularization¶ (float) – Regularization param for the lasso. Default value: 0.
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')

getBackgroundData()[source]

Returns: A dataframe containing background data
Return type: backgroundData

getInputCol()[source]

Returns: input column name
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getKernelWidth()[source]

Returns: Kernel width. Default value: sqrt (number of features) * 0.75
Return type: kernelWidth

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getRegularization()[source]

Returns: Regularization param for the lasso. Default value: 0.
Return type: regularization

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')

kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')

setBackgroundData(value)[source]

Parameters: backgroundData¶ – A dataframe containing background data

setInputCol(value)[source]

Parameters: inputCol¶ – input column name

setKernelWidth(value)[source]

Parameters: kernelWidth¶ – Kernel width. Default value: sqrt (number of features) * 0.75

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_9f02d3dcb6ab__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]: Set the (keyword only) parameters

setRegularization(value)[source]

Parameters: regularization¶ – Regularization param for the lasso. Default value: 0.

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

synapse.ml.explainers.VectorSHAP module

class synapse.ml.explainers.VectorSHAP.VectorSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_f21598c2f0b4__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters

backgroundData¶ (object) – A dataframe containing background data
infWeight¶ (float) – The double value to represent infinite weight. Default: 1E8.
inputCol¶ (str) – input column name
metricsCol¶ (str) – Column name for fitting metrics
model¶ (object) – The model to be interpreted.
numSamples¶ (int) – Number of samples to generate.
outputCol¶ (str) – output column name
targetClasses¶ (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol¶ (str) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol¶ (str) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')

getBackgroundData()[source]

Returns: A dataframe containing background data
Return type: backgroundData

getInfWeight()[source]

Returns: The double value to represent infinite weight. Default: 1E8.
Return type: infWeight

getInputCol()[source]

Returns: input column name
Return type: inputCol

static getJavaPackage()[source]: Returns package name String.

getMetricsCol()[source]

Returns: Column name for fitting metrics
Return type: metricsCol

getModel()[source]

Returns: The model to be interpreted.
Return type: model

getNumSamples()[source]

Returns: Number of samples to generate.
Return type: numSamples

getOutputCol()[source]

Returns: output column name
Return type: outputCol

getTargetClasses()[source]

Returns: The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
Return type: targetClasses

getTargetClassesCol()[source]

Returns: The name of the column that specifies the indices of the classes for multinomial classification models.
Return type: targetClassesCol

getTargetCol()[source]

Returns: The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
Return type: targetCol

infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')

inputCol = Param(parent='undefined', name='inputCol', doc='input column name')

metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')

model = Param(parent='undefined', name='model', doc='The model to be interpreted.')

numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')

classmethod read()[source]: Returns an MLReader instance for this class.

setBackgroundData(value)[source]

Parameters: backgroundData¶ – A dataframe containing background data

setInfWeight(value)[source]

Parameters: infWeight¶ – The double value to represent infinite weight. Default: 1E8.

setInputCol(value)[source]

Parameters: inputCol¶ – input column name

setMetricsCol(value)[source]

Parameters: metricsCol¶ – Column name for fitting metrics

setModel(value)[source]

Parameters: model¶ – The model to be interpreted.

setNumSamples(value)[source]

Parameters: numSamples¶ – Number of samples to generate.

setOutputCol(value)[source]

Parameters: outputCol¶ – output column name

setParams(backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_f21598c2f0b4__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]: Set the (keyword only) parameters

setTargetClasses(value)[source]

Parameters: targetClasses¶ – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.

setTargetClassesCol(value)[source]

Parameters: targetClassesCol¶ – The name of the column that specifies the indices of the classes for multinomial classification models.

setTargetCol(value)[source]

Parameters: targetCol¶ – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability

targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')

targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')

targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.