synapse.ml.explainers package
Submodules
synapse.ml.explainers.ImageLIME module
- class synapse.ml.explainers.ImageLIME.ImageLIME(java_obj=None, cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_2b50d8366c06__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
cellSize (float) – Number that controls the size of the superpixels
inputCol (object) – input column name
kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
modifier (float) – Controls the trade-off spatial and color distance
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
regularization (float) – Regularization param for the lasso. Default value: 0.
samplingFraction (float) – The fraction of superpixels (for image) or tokens (for text) to keep on
superpixelCol (object) – The column holding the superpixel decompositions
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')
- getCellSize()[source]
- Returns
Number that controls the size of the superpixels
- Return type
cellSize
- getKernelWidth()[source]
- Returns
Kernel width. Default value: sqrt (number of features) * 0.75
- Return type
kernelWidth
- getModifier()[source]
- Returns
Controls the trade-off spatial and color distance
- Return type
modifier
- getRegularization()[source]
- Returns
Regularization param for the lasso. Default value: 0.
- Return type
regularization
- getSamplingFraction()[source]
- Returns
The fraction of superpixels (for image) or tokens (for text) to keep on
- Return type
samplingFraction
- getSuperpixelCol()[source]
- Returns
The column holding the superpixel decompositions
- Return type
superpixelCol
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
- kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
- samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')
- setKernelWidth(value)[source]
- Parameters
kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75
- setParams(cellSize=16.0, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, modifier=130.0, numSamples=900, outputCol='ImageLIME_2b50d8366c06__output', regularization=0.0, samplingFraction=0.7, superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Set the (keyword only) parameters
- setRegularization(value)[source]
- Parameters
regularization – Regularization param for the lasso. Default value: 0.
- setSamplingFraction(value)[source]
- Parameters
samplingFraction – The fraction of superpixels (for image) or tokens (for text) to keep on
- setSuperpixelCol(value)[source]
- Parameters
superpixelCol – The column holding the superpixel decompositions
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
synapse.ml.explainers.ImageSHAP module
- class synapse.ml.explainers.ImageSHAP.ImageSHAP(java_obj=None, cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_f23832d1edef__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
cellSize (float) – Number that controls the size of the superpixels
infWeight (float) – The double value to represent infinite weight. Default: 1E8.
inputCol (object) – input column name
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
modifier (float) – Controls the trade-off spatial and color distance
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
superpixelCol (object) – The column holding the superpixel decompositions
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- cellSize = Param(parent='undefined', name='cellSize', doc='Number that controls the size of the superpixels')
- getCellSize()[source]
- Returns
Number that controls the size of the superpixels
- Return type
cellSize
- getInfWeight()[source]
- Returns
The double value to represent infinite weight. Default: 1E8.
- Return type
infWeight
- getModifier()[source]
- Returns
Controls the trade-off spatial and color distance
- Return type
modifier
- getSuperpixelCol()[source]
- Returns
The column holding the superpixel decompositions
- Return type
superpixelCol
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
- inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- modifier = Param(parent='undefined', name='modifier', doc='Controls the trade-off spatial and color distance')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- setInfWeight(value)[source]
- Parameters
infWeight – The double value to represent infinite weight. Default: 1E8.
- setParams(cellSize=16.0, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, modifier=130.0, numSamples=None, outputCol='ImageSHAP_f23832d1edef__output', superpixelCol='superpixels', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Set the (keyword only) parameters
- setSuperpixelCol(value)[source]
- Parameters
superpixelCol – The column holding the superpixel decompositions
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- superpixelCol = Param(parent='undefined', name='superpixelCol', doc='The column holding the superpixel decompositions')
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
synapse.ml.explainers.TabularLIME module
- class synapse.ml.explainers.TabularLIME.TabularLIME(java_obj=None, backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_5945094b3378__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
backgroundData (object) – A dataframe containing background data
categoricalFeatures (list) – Name of features that should be treated as categorical variables.
inputCols (list) – input column names
kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
regularization (float) – Regularization param for the lasso. Default value: 0.
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
- categoricalFeatures = Param(parent='undefined', name='categoricalFeatures', doc='Name of features that should be treated as categorical variables.')
- getBackgroundData()[source]
- Returns
A dataframe containing background data
- Return type
backgroundData
- getCategoricalFeatures()[source]
- Returns
Name of features that should be treated as categorical variables.
- Return type
categoricalFeatures
- getKernelWidth()[source]
- Returns
Kernel width. Default value: sqrt (number of features) * 0.75
- Return type
kernelWidth
- getRegularization()[source]
- Returns
Regularization param for the lasso. Default value: 0.
- Return type
regularization
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- inputCols = Param(parent='undefined', name='inputCols', doc='input column names')
- kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
- setBackgroundData(value)[source]
- Parameters
backgroundData – A dataframe containing background data
- setCategoricalFeatures(value)[source]
- Parameters
categoricalFeatures – Name of features that should be treated as categorical variables.
- setKernelWidth(value)[source]
- Parameters
kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75
- setParams(backgroundData=None, categoricalFeatures=[], inputCols=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TabularLIME_5945094b3378__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Set the (keyword only) parameters
- setRegularization(value)[source]
- Parameters
regularization – Regularization param for the lasso. Default value: 0.
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
synapse.ml.explainers.TabularSHAP module
- class synapse.ml.explainers.TabularSHAP.TabularSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_25ca239363fe__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
backgroundData (object) – A dataframe containing background data
infWeight (float) – The double value to represent infinite weight. Default: 1E8.
inputCols (list) – input column names
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
- getBackgroundData()[source]
- Returns
A dataframe containing background data
- Return type
backgroundData
- getInfWeight()[source]
- Returns
The double value to represent infinite weight. Default: 1E8.
- Return type
infWeight
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
- inputCols = Param(parent='undefined', name='inputCols', doc='input column names')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- setBackgroundData(value)[source]
- Parameters
backgroundData – A dataframe containing background data
- setInfWeight(value)[source]
- Parameters
infWeight – The double value to represent infinite weight. Default: 1E8.
- setParams(backgroundData=None, infWeight=100000000.0, inputCols=None, metricsCol='r2', model=None, numSamples=None, outputCol='TabularSHAP_25ca239363fe__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Set the (keyword only) parameters
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
synapse.ml.explainers.TextLIME module
- class synapse.ml.explainers.TextLIME.TextLIME(java_obj=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_184ca2fda56f__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
inputCol (object) – input column name
kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
regularization (float) – Regularization param for the lasso. Default value: 0.
samplingFraction (float) – The fraction of superpixels (for image) or tokens (for text) to keep on
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
tokensCol (object) – The column holding the tokens
- getKernelWidth()[source]
- Returns
Kernel width. Default value: sqrt (number of features) * 0.75
- Return type
kernelWidth
- getRegularization()[source]
- Returns
Regularization param for the lasso. Default value: 0.
- Return type
regularization
- getSamplingFraction()[source]
- Returns
The fraction of superpixels (for image) or tokens (for text) to keep on
- Return type
samplingFraction
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
- kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
- samplingFraction = Param(parent='undefined', name='samplingFraction', doc='The fraction of superpixels (for image) or tokens (for text) to keep on')
- setKernelWidth(value)[source]
- Parameters
kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75
- setParams(inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='TextLIME_184ca2fda56f__output', regularization=0.0, samplingFraction=0.7, targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]
Set the (keyword only) parameters
- setRegularization(value)[source]
- Parameters
regularization – Regularization param for the lasso. Default value: 0.
- setSamplingFraction(value)[source]
- Parameters
samplingFraction – The fraction of superpixels (for image) or tokens (for text) to keep on
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
- tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')
synapse.ml.explainers.TextSHAP module
- class synapse.ml.explainers.TextSHAP.TextSHAP(java_obj=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_3c9b0dcf639e__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
infWeight (float) – The double value to represent infinite weight. Default: 1E8.
inputCol (object) – input column name
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
tokensCol (object) – The column holding the tokens
- getInfWeight()[source]
- Returns
The double value to represent infinite weight. Default: 1E8.
- Return type
infWeight
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
- inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- setInfWeight(value)[source]
- Parameters
infWeight – The double value to represent infinite weight. Default: 1E8.
- setParams(infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='TextSHAP_3c9b0dcf639e__output', targetClasses=[], targetClassesCol=None, targetCol='probability', tokensCol='tokens')[source]
Set the (keyword only) parameters
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
- tokensCol = Param(parent='undefined', name='tokensCol', doc='The column holding the tokens')
synapse.ml.explainers.VectorLIME module
- class synapse.ml.explainers.VectorLIME.VectorLIME(java_obj=None, backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_53c2569d6733__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
backgroundData (object) – A dataframe containing background data
inputCol (object) – input column name
kernelWidth (float) – Kernel width. Default value: sqrt (number of features) * 0.75
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
regularization (float) – Regularization param for the lasso. Default value: 0.
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
- getBackgroundData()[source]
- Returns
A dataframe containing background data
- Return type
backgroundData
- getKernelWidth()[source]
- Returns
Kernel width. Default value: sqrt (number of features) * 0.75
- Return type
kernelWidth
- getRegularization()[source]
- Returns
Regularization param for the lasso. Default value: 0.
- Return type
regularization
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
- kernelWidth = Param(parent='undefined', name='kernelWidth', doc='Kernel width. Default value: sqrt (number of features) * 0.75')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- regularization = Param(parent='undefined', name='regularization', doc='Regularization param for the lasso. Default value: 0.')
- setBackgroundData(value)[source]
- Parameters
backgroundData – A dataframe containing background data
- setKernelWidth(value)[source]
- Parameters
kernelWidth – Kernel width. Default value: sqrt (number of features) * 0.75
- setParams(backgroundData=None, inputCol=None, kernelWidth=0.75, metricsCol='r2', model=None, numSamples=1000, outputCol='VectorLIME_53c2569d6733__output', regularization=0.0, targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Set the (keyword only) parameters
- setRegularization(value)[source]
- Parameters
regularization – Regularization param for the lasso. Default value: 0.
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
synapse.ml.explainers.VectorSHAP module
- class synapse.ml.explainers.VectorSHAP.VectorSHAP(java_obj=None, backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_feef2b02a63a__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
backgroundData (object) – A dataframe containing background data
infWeight (float) – The double value to represent infinite weight. Default: 1E8.
inputCol (object) – input column name
metricsCol (object) – Column name for fitting metrics
model (object) – The model to be interpreted.
numSamples (int) – Number of samples to generate.
outputCol (object) – output column name
targetClasses (list) – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
targetClassesCol (object) – The name of the column that specifies the indices of the classes for multinomial classification models.
targetCol (object) – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- backgroundData = Param(parent='undefined', name='backgroundData', doc='A dataframe containing background data')
- getBackgroundData()[source]
- Returns
A dataframe containing background data
- Return type
backgroundData
- getInfWeight()[source]
- Returns
The double value to represent infinite weight. Default: 1E8.
- Return type
infWeight
- getTargetClasses()[source]
- Returns
The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- Return type
targetClasses
- getTargetClassesCol()[source]
- Returns
The name of the column that specifies the indices of the classes for multinomial classification models.
- Return type
targetClassesCol
- getTargetCol()[source]
- Returns
The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- Return type
targetCol
- infWeight = Param(parent='undefined', name='infWeight', doc='The double value to represent infinite weight. Default: 1E8.')
- inputCol = Param(parent='undefined', name='inputCol', doc='input column name')
- metricsCol = Param(parent='undefined', name='metricsCol', doc='Column name for fitting metrics')
- model = Param(parent='undefined', name='model', doc='The model to be interpreted.')
- numSamples = Param(parent='undefined', name='numSamples', doc='Number of samples to generate.')
- outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
- setBackgroundData(value)[source]
- Parameters
backgroundData – A dataframe containing background data
- setInfWeight(value)[source]
- Parameters
infWeight – The double value to represent infinite weight. Default: 1E8.
- setParams(backgroundData=None, infWeight=100000000.0, inputCol=None, metricsCol='r2', model=None, numSamples=None, outputCol='VectorSHAP_feef2b02a63a__output', targetClasses=[], targetClassesCol=None, targetCol='probability')[source]
Set the (keyword only) parameters
- setTargetClasses(value)[source]
- Parameters
targetClasses – The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.
- setTargetClassesCol(value)[source]
- Parameters
targetClassesCol – The name of the column that specifies the indices of the classes for multinomial classification models.
- setTargetCol(value)[source]
- Parameters
targetCol – The column name of the prediction target to explain (i.e. the response variable). This is usually set to “prediction” for regression models and “probability” for probabilistic classification models. Default value: probability
- targetClasses = Param(parent='undefined', name='targetClasses', doc='The indices of the classes for multinomial classification models. Default: 0.For regression models this parameter is ignored.')
- targetClassesCol = Param(parent='undefined', name='targetClassesCol', doc='The name of the column that specifies the indices of the classes for multinomial classification models.')
- targetCol = Param(parent='undefined', name='targetCol', doc='The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models. Default value: probability')
Module contents
SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.