Packages

package exploratory

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class AggregateBalanceMeasure extends Transformer with DataBalanceParams with ComplexParamsWritable with Wrappable with BasicLogging

    This transformer computes a set of aggregated balance measures that represents how balanced the given dataframe is along the given sensitive features.

    This transformer computes a set of aggregated balance measures that represents how balanced the given dataframe is along the given sensitive features.

    The output is a dataframe that contains one column:

    • A struct containing measure names and their values showing higher notions of inequality. The following measures are computed:
      • Atkinson Index - https://en.wikipedia.org/wiki/Atkinson_index
      • Theil Index (L and T) - https://en.wikipedia.org/wiki/Theil_index

    The output dataframe contains one row.

    Annotations
    @Experimental()
  2. trait DataBalanceParams extends Params with HasOutputCol
  3. class DistributionBalanceMeasure extends Transformer with DataBalanceParams with ComplexParamsWritable with Wrappable with BasicLogging

    This transformer computes data balance measures based on a reference distribution.

    This transformer computes data balance measures based on a reference distribution. For now, we only support a uniform reference distribution.

    The output is a dataframe that contains two columns:

    • The sensitive feature name.
    • A struct containing measure names and their values showing differences between the observed and reference distributions. The following measures are computed:
      • Kullback-Leibler Divergence - https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
      • Jensen-Shannon Distance - https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
      • Wasserstein Distance - https://en.wikipedia.org/wiki/Wasserstein_metric
      • Infinity Norm Distance - https://en.wikipedia.org/wiki/Chebyshev_distance
      • Total Variation Distance - https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures
      • Chi-Squared Test - https://en.wikipedia.org/wiki/Chi-squared_test

    The output dataframe contains a row per sensitive feature.

    Annotations
    @Experimental()
  4. class FeatureBalanceMeasure extends Transformer with DataBalanceParams with HasLabelCol with ComplexParamsWritable with Wrappable with BasicLogging

    This transformer computes a set of balance measures from the given dataframe and sensitive features.

    This transformer computes a set of balance measures from the given dataframe and sensitive features.

    The output is a dataframe that contains four columns:

    • The sensitive feature name.
    • A feature value within the sensitive feature.
    • Another feature value within the sensitive feature.
    • A struct containing measure names and their values showing parities between the two feature values. The following measures are computed:
      • Demographic Parity - https://en.wikipedia.org/wiki/Fairness_(machine_learning)
      • Pointwise Mutual Information - https://en.wikipedia.org/wiki/Pointwise_mutual_information
      • Sorensen-Dice Coefficient - https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
      • Jaccard Index - https://en.wikipedia.org/wiki/Jaccard_index
      • Kendall Rank Correlation - https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient
      • Log-Likelihood Ratio - https://en.wikipedia.org/wiki/Likelihood_function#Likelihood_ratio
      • t-test - https://en.wikipedia.org/wiki/Student's_t-test

    The output dataframe contains a row per combination of feature values for each sensitive feature.

    Annotations
    @Experimental()

Ungrouped