Packages

class DistributionBalanceMeasure extends Transformer with DataBalanceParams with ComplexParamsWritable with Wrappable with BasicLogging

This transformer computes data balance measures based on a reference distribution. For now, we only support a uniform reference distribution.

The output is a dataframe that contains two columns:

  • The sensitive feature name.
  • A struct containing measure names and their values showing differences between the observed and reference distributions. The following measures are computed:
    • Kullback-Leibler Divergence - https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
    • Jensen-Shannon Distance - https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
    • Wasserstein Distance - https://en.wikipedia.org/wiki/Wasserstein_metric
    • Infinity Norm Distance - https://en.wikipedia.org/wiki/Chebyshev_distance
    • Total Variation Distance - https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures
    • Chi-Squared Test - https://en.wikipedia.org/wiki/Chi-squared_test

The output dataframe contains a row per sensitive feature.

Annotations
@Experimental()
Linear Supertypes
BasicLogging, Wrappable, RWrappable, PythonWrappable, BaseWrappable, ComplexParamsWritable, MLWritable, DataBalanceParams, HasOutputCol, Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DistributionBalanceMeasure
  2. BasicLogging
  3. Wrappable
  4. RWrappable
  5. PythonWrappable
  6. BaseWrappable
  7. ComplexParamsWritable
  8. MLWritable
  9. DataBalanceParams
  10. HasOutputCol
  11. Transformer
  12. PipelineStage
  13. Logging
  14. Params
  15. Serializable
  16. Serializable
  17. Identifiable
  18. AnyRef
  19. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DistributionBalanceMeasure()
  2. new DistributionBalanceMeasure(uid: String)

    uid

    The unique ID.

Value Members

  1. final def clear(param: Param[_]): DistributionBalanceMeasure.this.type
    Definition Classes
    Params
  2. def copy(extra: ParamMap): Transformer
    Definition Classes
    DistributionBalanceMeasure → Transformer → PipelineStage → Params
  3. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  4. def explainParams(): String
    Definition Classes
    Params
  5. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  6. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  7. val featureNameCol: Param[String]
  8. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  9. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  10. def getFeatureNameCol: String
  11. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  12. final def getOutputCol: String
    Definition Classes
    HasOutputCol
  13. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  14. def getSensitiveCols: Array[String]
    Definition Classes
    DataBalanceParams
  15. def getVerbose: Boolean
    Definition Classes
    DataBalanceParams
  16. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  17. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  18. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  19. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  20. def logClass(): Unit
    Definition Classes
    BasicLogging
  21. def logFit[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  22. def logPredict[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  23. def logTrain[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  24. def logTransform[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  25. def logVerb[T](verb: String, f: ⇒ T): T
    Definition Classes
    BasicLogging
  26. def makePyFile(conf: CodegenConfig): Unit
    Definition Classes
    PythonWrappable
  27. def makeRFile(conf: CodegenConfig): Unit
    Definition Classes
    RWrappable
  28. final val outputCol: Param[String]
    Definition Classes
    HasOutputCol
  29. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  30. def pyAdditionalMethods: String
    Definition Classes
    PythonWrappable
  31. def pyInitFunc(): String
    Definition Classes
    PythonWrappable
  32. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  33. val sensitiveCols: StringArrayParam
    Definition Classes
    DataBalanceParams
  34. final def set[T](param: Param[T], value: T): DistributionBalanceMeasure.this.type
    Definition Classes
    Params
  35. def setFeatureNameCol(value: String): DistributionBalanceMeasure.this.type
  36. def setOutputCol(value: String): DistributionBalanceMeasure.this.type
    Definition Classes
    DataBalanceParams
  37. def setSensitiveCols(values: Array[String]): DistributionBalanceMeasure.this.type
    Definition Classes
    DataBalanceParams
  38. def setVerbose(value: Boolean): DistributionBalanceMeasure.this.type
    Definition Classes
    DataBalanceParams
  39. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  40. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    DistributionBalanceMeasure → Transformer
  41. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  42. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  43. def transformSchema(schema: StructType): StructType
    Definition Classes
    DistributionBalanceMeasure → PipelineStage
  44. val uid: String
    Definition Classes
    DistributionBalanceMeasureBasicLogging → Identifiable
  45. def validateSchema(schema: StructType): Unit
    Definition Classes
    DataBalanceParams
  46. val ver: String
    Definition Classes
    BasicLogging
  47. val verbose: BooleanParam
    Definition Classes
    DataBalanceParams
  48. def write: MLWriter
    Definition Classes
    ComplexParamsWritable → MLWritable