Packages

class DistributionBalanceMeasure extends Transformer with DataBalanceParams with ComplexParamsWritable with Wrappable with SynapseMLLogging

This transformer computes data balance measures based on a reference distribution. For now, we only support a uniform reference distribution.

The output is a dataframe that contains two columns:

  • The sensitive feature name.
  • A struct containing measure names and their values showing differences between the observed and reference distributions. The following measures are computed:
    • Kullback-Leibler Divergence - https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
    • Jensen-Shannon Distance - https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
    • Wasserstein Distance - https://en.wikipedia.org/wiki/Wasserstein_metric
    • Infinity Norm Distance - https://en.wikipedia.org/wiki/Chebyshev_distance
    • Total Variation Distance - https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures
    • Chi-Squared Test - https://en.wikipedia.org/wiki/Chi-squared_test

The output dataframe contains a row per sensitive feature.

Annotations
@Experimental()
Linear Supertypes
SynapseMLLogging, Wrappable, DotnetWrappable, RWrappable, PythonWrappable, BaseWrappable, ComplexParamsWritable, MLWritable, DataBalanceParams, HasOutputCol, Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DistributionBalanceMeasure
  2. SynapseMLLogging
  3. Wrappable
  4. DotnetWrappable
  5. RWrappable
  6. PythonWrappable
  7. BaseWrappable
  8. ComplexParamsWritable
  9. MLWritable
  10. DataBalanceParams
  11. HasOutputCol
  12. Transformer
  13. PipelineStage
  14. Logging
  15. Params
  16. Serializable
  17. Serializable
  18. Identifiable
  19. AnyRef
  20. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DistributionBalanceMeasure()
  2. new DistributionBalanceMeasure(uid: String)

    uid

    The unique ID.

Value Members

  1. final def clear(param: Param[_]): DistributionBalanceMeasure.this.type
    Definition Classes
    Params
  2. def copy(extra: ParamMap): Transformer
    Definition Classes
    DistributionBalanceMeasure → Transformer → PipelineStage → Params
  3. def dotnetAdditionalMethods: String
    Definition Classes
    DotnetWrappable
  4. val emptyReferenceDistribution: Array[Map[String, Double]]
  5. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  6. def explainParams(): String
    Definition Classes
    Params
  7. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  8. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  9. val featureNameCol: Param[String]
  10. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  11. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  12. def getFeatureNameCol: String
  13. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  14. final def getOutputCol: String
    Definition Classes
    HasOutputCol
  15. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  16. def getParamInfo(p: Param[_]): ParamInfo[_]
    Definition Classes
    BaseWrappable
  17. def getReferenceDistribution: Array[Map[String, Double]]
  18. def getSensitiveCols: Array[String]
    Definition Classes
    DataBalanceParams
  19. def getVerbose: Boolean
    Definition Classes
    DataBalanceParams
  20. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  21. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  22. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  23. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  24. def logClass(featureName: String): Unit
    Definition Classes
    SynapseMLLogging
  25. def logFit[T](f: ⇒ T, columns: Int): T
    Definition Classes
    SynapseMLLogging
  26. def logTransform[T](f: ⇒ T, columns: Int): T
    Definition Classes
    SynapseMLLogging
  27. def logVerb[T](verb: String, f: ⇒ T, columns: Option[Int] = None): T
    Definition Classes
    SynapseMLLogging
  28. def makeDotnetFile(conf: CodegenConfig): Unit
    Definition Classes
    DotnetWrappable
  29. def makePyFile(conf: CodegenConfig): Unit
    Definition Classes
    PythonWrappable
  30. def makeRFile(conf: CodegenConfig): Unit
    Definition Classes
    RWrappable
  31. final val outputCol: Param[String]
    Definition Classes
    HasOutputCol
  32. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  33. def pyAdditionalMethods: String
    Definition Classes
    PythonWrappable
  34. def pyInitFunc(): String
    Definition Classes
    PythonWrappable
  35. val referenceDistribution: ArrayMapParam
  36. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  37. val sensitiveCols: StringArrayParam
    Definition Classes
    DataBalanceParams
  38. final def set[T](param: Param[T], value: T): DistributionBalanceMeasure.this.type
    Definition Classes
    Params
  39. def setFeatureNameCol(value: String): DistributionBalanceMeasure.this.type
  40. def setOutputCol(value: String): DistributionBalanceMeasure.this.type
    Definition Classes
    DataBalanceParams
  41. def setReferenceDistribution(value: ArrayList[HashMap[String, Double]]): DistributionBalanceMeasure.this.type
  42. def setReferenceDistribution(value: Array[Map[String, Double]]): DistributionBalanceMeasure.this.type
  43. def setSensitiveCols(values: Array[String]): DistributionBalanceMeasure.this.type
    Definition Classes
    DataBalanceParams
  44. def setVerbose(value: Boolean): DistributionBalanceMeasure.this.type
    Definition Classes
    DataBalanceParams
  45. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  46. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    DistributionBalanceMeasure → Transformer
  47. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  48. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  49. def transformSchema(schema: StructType): StructType
    Definition Classes
    DistributionBalanceMeasure → PipelineStage
  50. val uid: String
    Definition Classes
    DistributionBalanceMeasureSynapseMLLogging → Identifiable
  51. def validateSchema(schema: StructType): Unit
  52. val verbose: BooleanParam
    Definition Classes
    DataBalanceParams
  53. def write: MLWriter
    Definition Classes
    ComplexParamsWritable → MLWritable