synapse.ml.causal package

Submodules

synapse.ml.causal.DoubleMLEstimator module

class synapse.ml.causal.DoubleMLEstimator.DoubleMLEstimator(java_obj=None, confidenceLevel=0.975, featuresCol=None, maxIter=1, outcomeCol=None, outcomeModel=None, parallelism=10, sampleSplitRatio=[0.5, 0.5], treatmentCol=None, treatmentModel=None, weightCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • confidenceLevel (float) – confidence level, default value is 0.975

  • featuresCol (str) – The name of the features column

  • maxIter (int) – maximum number of iterations (>= 0)

  • outcomeCol (str) – outcome column

  • outcomeModel (object) – outcome model to run

  • parallelism (int) – the number of threads to use when running parallel algorithms

  • sampleSplitRatio (list) – Sample split ratio for cross-fitting. Default: [0.5, 0.5].

  • treatmentCol (str) – treatment column

  • treatmentModel (object) – treatment model to run

  • weightCol (str) – The name of the weight column

confidenceLevel = Param(parent='undefined', name='confidenceLevel', doc='confidence level, default value is 0.975')
featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getConfidenceLevel()[source]
Returns

confidence level, default value is 0.975

Return type

confidenceLevel

getFeaturesCol()[source]
Returns

The name of the features column

Return type

featuresCol

static getJavaPackage()[source]

Returns package name String.

getMaxIter()[source]
Returns

maximum number of iterations (>= 0)

Return type

maxIter

getOutcomeCol()[source]
Returns

outcome column

Return type

outcomeCol

getOutcomeModel()[source]
Returns

outcome model to run

Return type

outcomeModel

getParallelism()[source]
Returns

the number of threads to use when running parallel algorithms

Return type

parallelism

getSampleSplitRatio()[source]
Returns

Sample split ratio for cross-fitting. Default: [0.5, 0.5].

Return type

sampleSplitRatio

getTreatmentCol()[source]
Returns

treatment column

Return type

treatmentCol

getTreatmentModel()[source]
Returns

treatment model to run

Return type

treatmentModel

getWeightCol()[source]
Returns

The name of the weight column

Return type

weightCol

maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')
outcomeCol = Param(parent='undefined', name='outcomeCol', doc='outcome column')
outcomeModel = Param(parent='undefined', name='outcomeModel', doc='outcome model to run')
parallelism = Param(parent='undefined', name='parallelism', doc='the number of threads to use when running parallel algorithms')
classmethod read()[source]

Returns an MLReader instance for this class.

sampleSplitRatio = Param(parent='undefined', name='sampleSplitRatio', doc='Sample split ratio for cross-fitting. Default: [0.5, 0.5].')
setConfidenceLevel(value)[source]
Parameters

confidenceLevel – confidence level, default value is 0.975

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setMaxIter(value)[source]
Parameters

maxIter – maximum number of iterations (>= 0)

setOutcomeCol(value)[source]
Parameters

outcomeCol – outcome column

setOutcomeModel(value)[source]
Parameters

outcomeModel – outcome model to run

setParallelism(value)[source]
Parameters

parallelism – the number of threads to use when running parallel algorithms

setParams(confidenceLevel=0.975, featuresCol=None, maxIter=1, outcomeCol=None, outcomeModel=None, parallelism=10, sampleSplitRatio=[0.5, 0.5], treatmentCol=None, treatmentModel=None, weightCol=None)[source]

Set the (keyword only) parameters

setSampleSplitRatio(value)[source]
Parameters

sampleSplitRatio – Sample split ratio for cross-fitting. Default: [0.5, 0.5].

setTreatmentCol(value)[source]
Parameters

treatmentCol – treatment column

setTreatmentModel(value)[source]
Parameters

treatmentModel – treatment model to run

setWeightCol(value)[source]
Parameters

weightCol – The name of the weight column

treatmentCol = Param(parent='undefined', name='treatmentCol', doc='treatment column')
treatmentModel = Param(parent='undefined', name='treatmentModel', doc='treatment model to run')
weightCol = Param(parent='undefined', name='weightCol', doc='The name of the weight column')

synapse.ml.causal.DoubleMLModel module

class synapse.ml.causal.DoubleMLModel.DoubleMLModel(java_obj=None, confidenceLevel=0.975, featuresCol=None, maxIter=1, outcomeCol=None, outcomeModel=None, parallelism=10, rawTreatmentEffects=None, sampleSplitRatio=[0.5, 0.5], treatmentCol=None, treatmentModel=None, weightCol=None)[source]

Bases: synapse.ml.causal._DoubleMLModel._DoubleMLModel

getAvgTreatmentEffect()[source]
getConfidenceInterval()[source]
getPValue()[source]

synapse.ml.causal.OrthoForestDMLEstimator module

class synapse.ml.causal.OrthoForestDMLEstimator.OrthoForestDMLEstimator(java_obj=None, confidenceLevel=0.975, confounderVecCol='XW', featuresCol=None, heterogeneityVecCol='X', maxDepth=5, maxIter=1, minSamplesLeaf=10, numTrees=20, outcomeCol=None, outcomeModel=None, outcomeResidualCol='OutcomeResidual', outputCol='EffectAverage', outputHighCol='EffectUpperBound', outputLowCol='EffectLowerBound', parallelism=10, sampleSplitRatio=[0.5, 0.5], treatmentCol=None, treatmentModel=None, treatmentResidualCol='TreatmentResidual', weightCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Parameters
  • confidenceLevel (float) – confidence level, default value is 0.975

  • confounderVecCol (str) – Confounders to control for

  • featuresCol (str) – The name of the features column

  • heterogeneityVecCol (str) – Vector to divide the treatment by

  • maxDepth (int) – Max Depth of Tree

  • maxIter (int) – maximum number of iterations (>= 0)

  • minSamplesLeaf (int) – Max Depth of Tree

  • numTrees (int) – Number of trees

  • outcomeCol (str) – outcome column

  • outcomeModel (object) – outcome model to run

  • outcomeResidualCol (str) – Outcome Residual Column

  • outputCol (str) – The name of the output column

  • outputHighCol (str) – Output Confidence Interval Low

  • outputLowCol (str) – Output Confidence Interval Low

  • parallelism (int) – the number of threads to use when running parallel algorithms

  • sampleSplitRatio (list) – Sample split ratio for cross-fitting. Default: [0.5, 0.5].

  • treatmentCol (str) – treatment column

  • treatmentModel (object) – treatment model to run

  • treatmentResidualCol (str) – Treatment Residual Column

  • weightCol (str) – The name of the weight column

confidenceLevel = Param(parent='undefined', name='confidenceLevel', doc='confidence level, default value is 0.975')
confounderVecCol = Param(parent='undefined', name='confounderVecCol', doc='Confounders to control for')
featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
getConfidenceLevel()[source]
Returns

confidence level, default value is 0.975

Return type

confidenceLevel

getConfounderVecCol()[source]
Returns

Confounders to control for

Return type

confounderVecCol

getFeaturesCol()[source]
Returns

The name of the features column

Return type

featuresCol

getHeterogeneityVecCol()[source]
Returns

Vector to divide the treatment by

Return type

heterogeneityVecCol

static getJavaPackage()[source]

Returns package name String.

getMaxDepth()[source]
Returns

Max Depth of Tree

Return type

maxDepth

getMaxIter()[source]
Returns

maximum number of iterations (>= 0)

Return type

maxIter

getMinSamplesLeaf()[source]
Returns

Max Depth of Tree

Return type

minSamplesLeaf

getNumTrees()[source]
Returns

Number of trees

Return type

numTrees

getOutcomeCol()[source]
Returns

outcome column

Return type

outcomeCol

getOutcomeModel()[source]
Returns

outcome model to run

Return type

outcomeModel

getOutcomeResidualCol()[source]
Returns

Outcome Residual Column

Return type

outcomeResidualCol

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getOutputHighCol()[source]
Returns

Output Confidence Interval Low

Return type

outputHighCol

getOutputLowCol()[source]
Returns

Output Confidence Interval Low

Return type

outputLowCol

getParallelism()[source]
Returns

the number of threads to use when running parallel algorithms

Return type

parallelism

getSampleSplitRatio()[source]
Returns

Sample split ratio for cross-fitting. Default: [0.5, 0.5].

Return type

sampleSplitRatio

getTreatmentCol()[source]
Returns

treatment column

Return type

treatmentCol

getTreatmentModel()[source]
Returns

treatment model to run

Return type

treatmentModel

getTreatmentResidualCol()[source]
Returns

Treatment Residual Column

Return type

treatmentResidualCol

getWeightCol()[source]
Returns

The name of the weight column

Return type

weightCol

heterogeneityVecCol = Param(parent='undefined', name='heterogeneityVecCol', doc='Vector to divide the treatment by')
maxDepth = Param(parent='undefined', name='maxDepth', doc='Max Depth of Tree')
maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')
minSamplesLeaf = Param(parent='undefined', name='minSamplesLeaf', doc='Max Depth of Tree')
numTrees = Param(parent='undefined', name='numTrees', doc='Number of trees')
outcomeCol = Param(parent='undefined', name='outcomeCol', doc='outcome column')
outcomeModel = Param(parent='undefined', name='outcomeModel', doc='outcome model to run')
outcomeResidualCol = Param(parent='undefined', name='outcomeResidualCol', doc='Outcome Residual Column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
outputHighCol = Param(parent='undefined', name='outputHighCol', doc='Output Confidence Interval Low')
outputLowCol = Param(parent='undefined', name='outputLowCol', doc='Output Confidence Interval Low')
parallelism = Param(parent='undefined', name='parallelism', doc='the number of threads to use when running parallel algorithms')
classmethod read()[source]

Returns an MLReader instance for this class.

sampleSplitRatio = Param(parent='undefined', name='sampleSplitRatio', doc='Sample split ratio for cross-fitting. Default: [0.5, 0.5].')
setConfidenceLevel(value)[source]
Parameters

confidenceLevel – confidence level, default value is 0.975

setConfounderVecCol(value)[source]
Parameters

confounderVecCol – Confounders to control for

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setHeterogeneityVecCol(value)[source]
Parameters

heterogeneityVecCol – Vector to divide the treatment by

setMaxDepth(value)[source]
Parameters

maxDepth – Max Depth of Tree

setMaxIter(value)[source]
Parameters

maxIter – maximum number of iterations (>= 0)

setMinSamplesLeaf(value)[source]
Parameters

minSamplesLeaf – Max Depth of Tree

setNumTrees(value)[source]
Parameters

numTrees – Number of trees

setOutcomeCol(value)[source]
Parameters

outcomeCol – outcome column

setOutcomeModel(value)[source]
Parameters

outcomeModel – outcome model to run

setOutcomeResidualCol(value)[source]
Parameters

outcomeResidualCol – Outcome Residual Column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setOutputHighCol(value)[source]
Parameters

outputHighCol – Output Confidence Interval Low

setOutputLowCol(value)[source]
Parameters

outputLowCol – Output Confidence Interval Low

setParallelism(value)[source]
Parameters

parallelism – the number of threads to use when running parallel algorithms

setParams(confidenceLevel=0.975, confounderVecCol='XW', featuresCol=None, heterogeneityVecCol='X', maxDepth=5, maxIter=1, minSamplesLeaf=10, numTrees=20, outcomeCol=None, outcomeModel=None, outcomeResidualCol='OutcomeResidual', outputCol='EffectAverage', outputHighCol='EffectUpperBound', outputLowCol='EffectLowerBound', parallelism=10, sampleSplitRatio=[0.5, 0.5], treatmentCol=None, treatmentModel=None, treatmentResidualCol='TreatmentResidual', weightCol=None)[source]

Set the (keyword only) parameters

setSampleSplitRatio(value)[source]
Parameters

sampleSplitRatio – Sample split ratio for cross-fitting. Default: [0.5, 0.5].

setTreatmentCol(value)[source]
Parameters

treatmentCol – treatment column

setTreatmentModel(value)[source]
Parameters

treatmentModel – treatment model to run

setTreatmentResidualCol(value)[source]
Parameters

treatmentResidualCol – Treatment Residual Column

setWeightCol(value)[source]
Parameters

weightCol – The name of the weight column

treatmentCol = Param(parent='undefined', name='treatmentCol', doc='treatment column')
treatmentModel = Param(parent='undefined', name='treatmentModel', doc='treatment model to run')
treatmentResidualCol = Param(parent='undefined', name='treatmentResidualCol', doc='Treatment Residual Column')
weightCol = Param(parent='undefined', name='weightCol', doc='The name of the weight column')

synapse.ml.causal.OrthoForestDMLModel module

class synapse.ml.causal.OrthoForestDMLModel.OrthoForestDMLModel(java_obj=None, confidenceLevel=0.975, confounderVecCol='XW', featuresCol=None, forest=None, heterogeneityVecCol='X', maxDepth=5, maxIter=1, minSamplesLeaf=10, numTrees=20, outcomeCol=None, outcomeModel=None, outcomeResidualCol='OutcomeResidual', outputCol='EffectAverage', outputHighCol='EffectUpperBound', outputLowCol='EffectLowerBound', parallelism=10, sampleSplitRatio=[0.5, 0.5], treatmentCol=None, treatmentModel=None, treatmentResidualCol='TreatmentResidual', weightCol=None)[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaModel

Parameters
  • confidenceLevel (float) – confidence level, default value is 0.975

  • confounderVecCol (str) – Confounders to control for

  • featuresCol (str) – The name of the features column

  • forest (object) – Forest Trees produced in Ortho Forest DML Estimator

  • heterogeneityVecCol (str) – Vector to divide the treatment by

  • maxDepth (int) – Max Depth of Tree

  • maxIter (int) – maximum number of iterations (>= 0)

  • minSamplesLeaf (int) – Max Depth of Tree

  • numTrees (int) – Number of trees

  • outcomeCol (str) – outcome column

  • outcomeModel (object) – outcome model to run

  • outcomeResidualCol (str) – Outcome Residual Column

  • outputCol (str) – The name of the output column

  • outputHighCol (str) – Output Confidence Interval Low

  • outputLowCol (str) – Output Confidence Interval Low

  • parallelism (int) – the number of threads to use when running parallel algorithms

  • sampleSplitRatio (list) – Sample split ratio for cross-fitting. Default: [0.5, 0.5].

  • treatmentCol (str) – treatment column

  • treatmentModel (object) – treatment model to run

  • treatmentResidualCol (str) – Treatment Residual Column

  • weightCol (str) – The name of the weight column

confidenceLevel = Param(parent='undefined', name='confidenceLevel', doc='confidence level, default value is 0.975')
confounderVecCol = Param(parent='undefined', name='confounderVecCol', doc='Confounders to control for')
featuresCol = Param(parent='undefined', name='featuresCol', doc='The name of the features column')
forest = Param(parent='undefined', name='forest', doc='Forest Trees produced in Ortho Forest DML Estimator')
getConfidenceLevel()[source]
Returns

confidence level, default value is 0.975

Return type

confidenceLevel

getConfounderVecCol()[source]
Returns

Confounders to control for

Return type

confounderVecCol

getFeaturesCol()[source]
Returns

The name of the features column

Return type

featuresCol

getForest()[source]
Returns

Forest Trees produced in Ortho Forest DML Estimator

Return type

forest

getHeterogeneityVecCol()[source]
Returns

Vector to divide the treatment by

Return type

heterogeneityVecCol

static getJavaPackage()[source]

Returns package name String.

getMaxDepth()[source]
Returns

Max Depth of Tree

Return type

maxDepth

getMaxIter()[source]
Returns

maximum number of iterations (>= 0)

Return type

maxIter

getMinSamplesLeaf()[source]
Returns

Max Depth of Tree

Return type

minSamplesLeaf

getNumTrees()[source]
Returns

Number of trees

Return type

numTrees

getOutcomeCol()[source]
Returns

outcome column

Return type

outcomeCol

getOutcomeModel()[source]
Returns

outcome model to run

Return type

outcomeModel

getOutcomeResidualCol()[source]
Returns

Outcome Residual Column

Return type

outcomeResidualCol

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getOutputHighCol()[source]
Returns

Output Confidence Interval Low

Return type

outputHighCol

getOutputLowCol()[source]
Returns

Output Confidence Interval Low

Return type

outputLowCol

getParallelism()[source]
Returns

the number of threads to use when running parallel algorithms

Return type

parallelism

getSampleSplitRatio()[source]
Returns

Sample split ratio for cross-fitting. Default: [0.5, 0.5].

Return type

sampleSplitRatio

getTreatmentCol()[source]
Returns

treatment column

Return type

treatmentCol

getTreatmentModel()[source]
Returns

treatment model to run

Return type

treatmentModel

getTreatmentResidualCol()[source]
Returns

Treatment Residual Column

Return type

treatmentResidualCol

getWeightCol()[source]
Returns

The name of the weight column

Return type

weightCol

heterogeneityVecCol = Param(parent='undefined', name='heterogeneityVecCol', doc='Vector to divide the treatment by')
maxDepth = Param(parent='undefined', name='maxDepth', doc='Max Depth of Tree')
maxIter = Param(parent='undefined', name='maxIter', doc='maximum number of iterations (>= 0)')
minSamplesLeaf = Param(parent='undefined', name='minSamplesLeaf', doc='Max Depth of Tree')
numTrees = Param(parent='undefined', name='numTrees', doc='Number of trees')
outcomeCol = Param(parent='undefined', name='outcomeCol', doc='outcome column')
outcomeModel = Param(parent='undefined', name='outcomeModel', doc='outcome model to run')
outcomeResidualCol = Param(parent='undefined', name='outcomeResidualCol', doc='Outcome Residual Column')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
outputHighCol = Param(parent='undefined', name='outputHighCol', doc='Output Confidence Interval Low')
outputLowCol = Param(parent='undefined', name='outputLowCol', doc='Output Confidence Interval Low')
parallelism = Param(parent='undefined', name='parallelism', doc='the number of threads to use when running parallel algorithms')
classmethod read()[source]

Returns an MLReader instance for this class.

sampleSplitRatio = Param(parent='undefined', name='sampleSplitRatio', doc='Sample split ratio for cross-fitting. Default: [0.5, 0.5].')
setConfidenceLevel(value)[source]
Parameters

confidenceLevel – confidence level, default value is 0.975

setConfounderVecCol(value)[source]
Parameters

confounderVecCol – Confounders to control for

setFeaturesCol(value)[source]
Parameters

featuresCol – The name of the features column

setForest(value)[source]
Parameters

forest – Forest Trees produced in Ortho Forest DML Estimator

setHeterogeneityVecCol(value)[source]
Parameters

heterogeneityVecCol – Vector to divide the treatment by

setMaxDepth(value)[source]
Parameters

maxDepth – Max Depth of Tree

setMaxIter(value)[source]
Parameters

maxIter – maximum number of iterations (>= 0)

setMinSamplesLeaf(value)[source]
Parameters

minSamplesLeaf – Max Depth of Tree

setNumTrees(value)[source]
Parameters

numTrees – Number of trees

setOutcomeCol(value)[source]
Parameters

outcomeCol – outcome column

setOutcomeModel(value)[source]
Parameters

outcomeModel – outcome model to run

setOutcomeResidualCol(value)[source]
Parameters

outcomeResidualCol – Outcome Residual Column

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setOutputHighCol(value)[source]
Parameters

outputHighCol – Output Confidence Interval Low

setOutputLowCol(value)[source]
Parameters

outputLowCol – Output Confidence Interval Low

setParallelism(value)[source]
Parameters

parallelism – the number of threads to use when running parallel algorithms

setParams(confidenceLevel=0.975, confounderVecCol='XW', featuresCol=None, forest=None, heterogeneityVecCol='X', maxDepth=5, maxIter=1, minSamplesLeaf=10, numTrees=20, outcomeCol=None, outcomeModel=None, outcomeResidualCol='OutcomeResidual', outputCol='EffectAverage', outputHighCol='EffectUpperBound', outputLowCol='EffectLowerBound', parallelism=10, sampleSplitRatio=[0.5, 0.5], treatmentCol=None, treatmentModel=None, treatmentResidualCol='TreatmentResidual', weightCol=None)[source]

Set the (keyword only) parameters

setSampleSplitRatio(value)[source]
Parameters

sampleSplitRatio – Sample split ratio for cross-fitting. Default: [0.5, 0.5].

setTreatmentCol(value)[source]
Parameters

treatmentCol – treatment column

setTreatmentModel(value)[source]
Parameters

treatmentModel – treatment model to run

setTreatmentResidualCol(value)[source]
Parameters

treatmentResidualCol – Treatment Residual Column

setWeightCol(value)[source]
Parameters

weightCol – The name of the weight column

treatmentCol = Param(parent='undefined', name='treatmentCol', doc='treatment column')
treatmentModel = Param(parent='undefined', name='treatmentModel', doc='treatment model to run')
treatmentResidualCol = Param(parent='undefined', name='treatmentResidualCol', doc='Treatment Residual Column')
weightCol = Param(parent='undefined', name='weightCol', doc='The name of the weight column')

synapse.ml.causal.OrthoForestVariableTransformer module

class synapse.ml.causal.OrthoForestVariableTransformer.OrthoForestVariableTransformer(java_obj=None, outcomeResidualCol='OResid', outputCol='_tmp_tsOutcome', treatmentResidualCol='TResid', weightsCol='_tmp_twOutcome')[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • outcomeResidualCol (str) – Outcome Residual Col

  • outputCol (str) – The name of the output column

  • treatmentResidualCol (str) – Treatment Residual Col

  • weightsCol (str) – Weights Col

static getJavaPackage()[source]

Returns package name String.

getOutcomeResidualCol()[source]
Returns

Outcome Residual Col

Return type

outcomeResidualCol

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getTreatmentResidualCol()[source]
Returns

Treatment Residual Col

Return type

treatmentResidualCol

getWeightsCol()[source]
Returns

Weights Col

Return type

weightsCol

outcomeResidualCol = Param(parent='undefined', name='outcomeResidualCol', doc='Outcome Residual Col')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
classmethod read()[source]

Returns an MLReader instance for this class.

setOutcomeResidualCol(value)[source]
Parameters

outcomeResidualCol – Outcome Residual Col

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(outcomeResidualCol='OResid', outputCol='_tmp_tsOutcome', treatmentResidualCol='TResid', weightsCol='_tmp_twOutcome')[source]

Set the (keyword only) parameters

setTreatmentResidualCol(value)[source]
Parameters

treatmentResidualCol – Treatment Residual Col

setWeightsCol(value)[source]
Parameters

weightsCol – Weights Col

treatmentResidualCol = Param(parent='undefined', name='treatmentResidualCol', doc='Treatment Residual Col')
weightsCol = Param(parent='undefined', name='weightsCol', doc='Weights Col')

synapse.ml.causal.ResidualTransformer module

class synapse.ml.causal.ResidualTransformer.ResidualTransformer(java_obj=None, classIndex=1, observedCol='label', outputCol='residual', predictedCol='prediction')[source]

Bases: synapse.ml.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • classIndex (int) – The index of the class to compute residual for classification outputs. Default value is 1.

  • observedCol (str) – observed data (label column)

  • outputCol (str) – The name of the output column

  • predictedCol (str) – predicted data (prediction or probability columns

classIndex = Param(parent='undefined', name='classIndex', doc='The index of the class to compute residual for classification outputs. Default value is 1.')
getClassIndex()[source]
Returns

The index of the class to compute residual for classification outputs. Default value is 1.

Return type

classIndex

static getJavaPackage()[source]

Returns package name String.

getObservedCol()[source]
Returns

observed data (label column)

Return type

observedCol

getOutputCol()[source]
Returns

The name of the output column

Return type

outputCol

getPredictedCol()[source]
Returns

predicted data (prediction or probability columns

Return type

predictedCol

observedCol = Param(parent='undefined', name='observedCol', doc='observed data (label column)')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
predictedCol = Param(parent='undefined', name='predictedCol', doc='predicted data (prediction or probability columns')
classmethod read()[source]

Returns an MLReader instance for this class.

setClassIndex(value)[source]
Parameters

classIndex – The index of the class to compute residual for classification outputs. Default value is 1.

setObservedCol(value)[source]
Parameters

observedCol – observed data (label column)

setOutputCol(value)[source]
Parameters

outputCol – The name of the output column

setParams(classIndex=1, observedCol='label', outputCol='residual', predictedCol='prediction')[source]

Set the (keyword only) parameters

setPredictedCol(value)[source]
Parameters

predictedCol – predicted data (prediction or probability columns

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.