synapse.ml.exploratory package

Submodules

synapse.ml.exploratory.AggregateBalanceMeasure module

class synapse.ml.exploratory.AggregateBalanceMeasure.AggregateBalanceMeasure(java_obj=None, epsilon=1.0, errorTolerance=1e-12, outputCol='AggregateBalanceMeasure', sensitiveCols=None, verbose=False)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • epsilon (float) – Epsilon value for Atkinson Index. Inverse of alpha (1 - alpha).

  • errorTolerance (float) – Error tolerance value for Atkinson Index.

  • outputCol (str) – output column name

  • sensitiveCols (list) – Sensitive columns to use.

  • verbose (bool) – Whether to show intermediate measures and calculations, such as Positive Rate.

epsilon = Param(parent='undefined', name='epsilon', doc='Epsilon value for Atkinson Index. Inverse of alpha (1 - alpha).')
errorTolerance = Param(parent='undefined', name='errorTolerance', doc='Error tolerance value for Atkinson Index.')
getEpsilon()[source]
Returns

Epsilon value for Atkinson Index. Inverse of alpha (1 - alpha).

Return type

epsilon

getErrorTolerance()[source]
Returns

Error tolerance value for Atkinson Index.

Return type

errorTolerance

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getSensitiveCols()[source]
Returns

Sensitive columns to use.

Return type

sensitiveCols

getVerbose()[source]
Returns

Whether to show intermediate measures and calculations, such as Positive Rate.

Return type

verbose

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitiveCols = Param(parent='undefined', name='sensitiveCols', doc='Sensitive columns to use.')
setEpsilon(value)[source]
Parameters

epsilon – Epsilon value for Atkinson Index. Inverse of alpha (1 - alpha).

setErrorTolerance(value)[source]
Parameters

errorTolerance – Error tolerance value for Atkinson Index.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(epsilon=1.0, errorTolerance=1e-12, outputCol='AggregateBalanceMeasure', sensitiveCols=None, verbose=False)[source]

Set the (keyword only) parameters

setSensitiveCols(value)[source]
Parameters

sensitiveCols – Sensitive columns to use.

setVerbose(value)[source]
Parameters

verbose – Whether to show intermediate measures and calculations, such as Positive Rate.

verbose = Param(parent='undefined', name='verbose', doc='Whether to show intermediate measures and calculations, such as Positive Rate.')

synapse.ml.exploratory.DistributionBalanceMeasure module

class synapse.ml.exploratory.DistributionBalanceMeasure.DistributionBalanceMeasure(java_obj=None, featureNameCol='FeatureName', outputCol='DistributionBalanceMeasure', referenceDistribution=None, sensitiveCols=None, verbose=False)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • featureNameCol (str) – Output column name for feature names.

  • outputCol (str) – output column name

  • referenceDistribution (object) – An ordered list of reference distributions that correspond to each of the sensitive columns.

  • sensitiveCols (list) – Sensitive columns to use.

  • verbose (bool) – Whether to show intermediate measures and calculations, such as Positive Rate.

featureNameCol = Param(parent='undefined', name='featureNameCol', doc='Output column name for feature names.')
getFeatureNameCol()[source]
Returns

Output column name for feature names.

Return type

featureNameCol

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getReferenceDistribution()[source]
Returns

An ordered list of reference distributions that correspond to each of the sensitive columns.

Return type

referenceDistribution

getSensitiveCols()[source]
Returns

Sensitive columns to use.

Return type

sensitiveCols

getVerbose()[source]
Returns

Whether to show intermediate measures and calculations, such as Positive Rate.

Return type

verbose

outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

referenceDistribution = Param(parent='undefined', name='referenceDistribution', doc='An ordered list of reference distributions that correspond to each of the sensitive columns.')
sensitiveCols = Param(parent='undefined', name='sensitiveCols', doc='Sensitive columns to use.')
setFeatureNameCol(value)[source]
Parameters

featureNameCol – Output column name for feature names.

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(featureNameCol='FeatureName', outputCol='DistributionBalanceMeasure', referenceDistribution=None, sensitiveCols=None, verbose=False)[source]

Set the (keyword only) parameters

setReferenceDistribution(value)[source]
Parameters

referenceDistribution – An ordered list of reference distributions that correspond to each of the sensitive columns.

setSensitiveCols(value)[source]
Parameters

sensitiveCols – Sensitive columns to use.

setVerbose(value)[source]
Parameters

verbose – Whether to show intermediate measures and calculations, such as Positive Rate.

verbose = Param(parent='undefined', name='verbose', doc='Whether to show intermediate measures and calculations, such as Positive Rate.')

synapse.ml.exploratory.FeatureBalanceMeasure module

class synapse.ml.exploratory.FeatureBalanceMeasure.FeatureBalanceMeasure(java_obj=None, classACol='ClassA', classBCol='ClassB', featureNameCol='FeatureName', labelCol='label', outputCol='FeatureBalanceMeasure', sensitiveCols=None, verbose=False)[source]

Bases: pyspark.ml.util.MLReadable[pyspark.ml.util.RL]

Parameters
  • classACol (str) – Output column name for the first feature value to compare.

  • classBCol (str) – Output column name for the second feature value to compare.

  • featureNameCol (str) – Output column name for feature names.

  • labelCol (str) – label column name

  • outputCol (str) – output column name

  • sensitiveCols (list) – Sensitive columns to use.

  • verbose (bool) – Whether to show intermediate measures and calculations, such as Positive Rate.

classACol = Param(parent='undefined', name='classACol', doc='Output column name for the first feature value to compare.')
classBCol = Param(parent='undefined', name='classBCol', doc='Output column name for the second feature value to compare.')
featureNameCol = Param(parent='undefined', name='featureNameCol', doc='Output column name for feature names.')
getClassACol()[source]
Returns

Output column name for the first feature value to compare.

Return type

classACol

getClassBCol()[source]
Returns

Output column name for the second feature value to compare.

Return type

classBCol

getFeatureNameCol()[source]
Returns

Output column name for feature names.

Return type

featureNameCol

static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns

label column name

Return type

labelCol

getOutputCol()[source]
Returns

output column name

Return type

outputCol

getSensitiveCols()[source]
Returns

Sensitive columns to use.

Return type

sensitiveCols

getVerbose()[source]
Returns

Whether to show intermediate measures and calculations, such as Positive Rate.

Return type

verbose

labelCol = Param(parent='undefined', name='labelCol', doc='label column name')
outputCol = Param(parent='undefined', name='outputCol', doc='output column name')
classmethod read()[source]

Returns an MLReader instance for this class.

sensitiveCols = Param(parent='undefined', name='sensitiveCols', doc='Sensitive columns to use.')
setClassACol(value)[source]
Parameters

classACol – Output column name for the first feature value to compare.

setClassBCol(value)[source]
Parameters

classBCol – Output column name for the second feature value to compare.

setFeatureNameCol(value)[source]
Parameters

featureNameCol – Output column name for feature names.

setLabelCol(value)[source]
Parameters

labelCol – label column name

setOutputCol(value)[source]
Parameters

outputCol – output column name

setParams(classACol='ClassA', classBCol='ClassB', featureNameCol='FeatureName', labelCol='label', outputCol='FeatureBalanceMeasure', sensitiveCols=None, verbose=False)[source]

Set the (keyword only) parameters

setSensitiveCols(value)[source]
Parameters

sensitiveCols – Sensitive columns to use.

setVerbose(value)[source]
Parameters

verbose – Whether to show intermediate measures and calculations, such as Positive Rate.

verbose = Param(parent='undefined', name='verbose', doc='Whether to show intermediate measures and calculations, such as Positive Rate.')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.