Volume 80, Issue 1
Original Article

Kernel‐based tests for joint independence

Niklas Pfister

Corresponding Author

E-mail address: niklas.pfister@stat.math.ethz.ch

Eidgenössiche Technische Hochschule, Zürich, Switzerland

Address for correspondence: Niklas Pfister, Seminar für Statistik, Eidgenössiche Technische Hochschule Zürich, Rämistrasse 101, Zürich 8092, Switzerland. E‐mail: niklas.pfister@stat.math.ethz.chSearch for more papers by this author
Peter Bühlmann

Eidgenössiche Technische Hochschule, Zürich, Switzerland

Search for more papers by this author
Bernhard Schölkopf

Max Planck Institute for Intelligent Systems, Tübingen, Germany

Search for more papers by this author
Jonas Peters

Max Planck Institute for Intelligent Systems, Tübingen, Germany

University of Copenhagen, Denmark

Search for more papers by this author
First published: 21 May 2017
Citations: 23

Summary

We investigate the problem of testing whether d possibly multivariate random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two‐variable Hilbert–Schmidt independence criterion but allows for an arbitrary number of variables. We embed the joint distribution and the product of the marginals in a reproducing kernel Hilbert space and define the d‐variable Hilbert–Schmidt independence criterion dHSIC as the squared distance between the embeddings. In the population case, the value of dHSIC is 0 if and only if the d variables are jointly independent, as long as the kernel is characteristic. On the basis of an empirical estimate of dHSIC, we investigate three non‐parametric hypothesis tests: a permutation test, a bootstrap analogue and a procedure based on a gamma approximation. We apply non‐parametric independence testing to a problem in causal discovery and illustrate the new methods on simulated and real data sets.

Number of times cited according to CrossRef: 23

  • Distinguish Markov Equivalence Classes from Large-Scale Linear Non-Gaussian Data, IEEE Access, 10.1109/ACCESS.2020.2965093, 8, (10924-10932), (2020).
  • Reconstruction of Networks with Direct and Indirect Genetic Effects, Genetics, 10.1534/genetics.119.302949, 214, 4, (781-807), (2020).
  • Some tests of independence based on maximum mean discrepancy and ranks of nearest neighbors, Statistics & Probability Letters, 10.1016/j.spl.2020.108793, (108793), (2020).
  • Some copula‐based tests of independence among several random variables having arbitrary probability distributions, Stat, 10.1002/sta4.263, 9, 1, (2020).
  • Copula versions of distance multivariance and dHSIC via the distributional transform – a general approach to construct invariant dependence measures, Statistics, 10.1080/02331888.2020.1748029, (1-18), (2020).
  • WIKS: a general Bayesian nonparametric index for quantifying differences between two populations, TEST, 10.1007/s11749-020-00718-y, (2020).
  • Some New Copula Based Distribution-free Tests of Independence among Several Random Variables, Sankhya A, 10.1007/s13171-020-00207-2, (2020).
  • A New Coefficient of Correlation, Journal of the American Statistical Association, 10.1080/01621459.2020.1758115, (1-21), (2020).
  • The Hellinger Correlation, Journal of the American Statistical Association, 10.1080/01621459.2020.1791132, (1-15), (2020).
  • On some consistent tests of mutual independence among several random vectors of arbitrary dimensions, Statistics and Computing, 10.1007/s11222-020-09967-1, (2020).
  • The conditional permutation test for independence while controlling for confounders, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 10.1111/rssb.12340, 82, 1, (175-197), (2019).
  • Independence test for large sparse contingency tables based on distance correlation, Statistics & Probability Letters, 10.1016/j.spl.2018.12.010, (2019).
  • A fast algorithm for computing distance correlation, Computational Statistics & Data Analysis, 10.1016/j.csda.2019.01.016, (2019).
  • undefined, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 10.1109/ICMLA.2019.00107, (573-580), (2019).
  • Tests of Zero Correlation Using Modified RV Coefficient for High-Dimensional Vectors, Journal of Statistical Theory and Practice, 10.1007/s42519-019-0043-x, 13, 3, (2019).
  • OUP accepted manuscript, Biometrika, 10.1093/biomet/asz024, (2019).
  • BET on Independence, Journal of the American Statistical Association, 10.1080/01621459.2018.1537921, (1-34), (2019).
  • An Updated Literature Review of Distance Correlation and Its Applications to Time Series, International Statistical Review, 10.1111/insr.12294, 87, 2, (237-262), (2018).
  • Generalizing distance covariance to measure and test multivariate mutual dependence via complete and incomplete V-statistics, Journal of Multivariate Analysis, 10.1016/j.jmva.2018.08.006, 168, (304-322), (2018).
  • Invariant Causal Prediction for Sequential Data, Journal of the American Statistical Association, 10.1080/01621459.2018.1491403, (1-13), (2018).
  • Comparing two populations using Bayesian Fourier series density estimation, Communications in Statistics - Simulation and Computation, 10.1080/03610918.2018.1484480, (1-19), (2018).
  • Distance Metrics for Measuring Joint Dependence with Application to Causal Inference, Journal of the American Statistical Association, 10.1080/01621459.2018.1513364, (1-24), (2018).
  • Composite Coefficient of Determination and Its Application in Ultrahigh Dimensional Variable Screening, Journal of the American Statistical Association, 10.1080/01621459.2018.1514305, (1-24), (2018).