Volume 79, Issue 5
Original Article

Convex clustering via l1 fusion penalization

Peter Radchenko

Corresponding Author

E-mail address: radchenk@marshall.usc.edu

University of Southern California, Los Angeles, USA

University of Sydney Business School, Darlington, Australia

Address for correspondence: Peter Radchenko, University of Sydney Business School, Darlington, NSW 2006, Australia. E‐mail: radchenk@marshall.usc.eduSearch for more papers by this author
Gourab Mukherjee

University of Southern California, Los Angeles, USA

Search for more papers by this author
First published: 13 February 2017
Citations: 11

Summary

We study the large sample behaviour of a convex clustering framework, which minimizes the sample within cluster sum of squares under an l1 fusion constraint on the cluster centroids. This recently proposed approach has been gaining in popularity; however, its asymptotic properties have remained mostly unknown. Our analysis is based on a novel representation of the sample clustering procedure as a sequence of cluster splits determined by a sequence of maximization problems. We use this representation to provide a simple and intuitive formulation for the population clustering procedure. We then demonstrate that the sample procedure consistently estimates its population analogue and we derive the corresponding rates of convergence. The proof conducts a careful simultaneous analysis of a collection of M‐estimation problems, whose cardinality grows together with the sample size. On the basis of the new perspectives gained from the asymptotic investigation, we propose a key post‐processing modification of the original clustering framework. We show, both theoretically and empirically, that the resulting approach can be successfully used to estimate the number of clusters in the population. Using simulated data, we compare the proposed method with existing number‐of‐clusters and modality assessment approaches and obtain encouraging results. We also demonstrate the applicability of our clustering method to the detection of cellular subpopulations in a single‐cell virology study.

Number of times cited according to CrossRef: 11

  • Identifying latent group structures in nonlinear panels, Journal of Econometrics, 10.1016/j.jeconom.2020.04.003, (2020).
  • Estimation and clustering for partially heterogeneous single index model, Statistical Papers, 10.1007/s00362-020-01203-2, (2020).
  • A Novel Convex Clustering Method for High-Dimensional Data Using Semiproximal ADMM, Mathematical Problems in Engineering, 10.1155/2020/9216351, 2020, (1-12), (2020).
  • Adaptive Convex Clustering of Generalized Linear Models With Application in Purchase Likelihood Prediction, Technometrics, 10.1080/00401706.2020.1733094, (1-13), (2020).
  • Panel data quantile regression with grouped fixed effects, Journal of Econometrics, 10.1016/j.jeconom.2019.04.006, (2019).
  • Recovering Trees with Convex Clustering, SIAM Journal on Mathematics of Data Science, 10.1137/18M121099X, 1, 3, (383-407), (2019).
  • undefined, 2019 IEEE Data Science Workshop (DSW), 10.1109/DSW.2019.8755599, (237-242), (2019).
  • Forward-Stagewise Clustering: An Algorithm for Convex Clustering, Pattern Recognition Letters, 10.1016/j.patrec.2019.09.014, (2019).
  • Solving Fused Penalty Estimation Problems via Block Splitting Algorithms, Journal of Computational and Graphical Statistics, 10.1080/10618600.2019.1660178, (1-12), (2019).
  • Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization, Journal of Computational and Graphical Statistics, 10.1080/10618600.2019.1629943, (1-10), (2019).
  • Feature screening in large scale cluster analysis, Journal of Multivariate Analysis, 10.1016/j.jmva.2017.08.001, 161, (191-212), (2017).