Convex clustering via l1 fusion penalization
Summary
We study the large sample behaviour of a convex clustering framework, which minimizes the sample within cluster sum of squares under an l1 fusion constraint on the cluster centroids. This recently proposed approach has been gaining in popularity; however, its asymptotic properties have remained mostly unknown. Our analysis is based on a novel representation of the sample clustering procedure as a sequence of cluster splits determined by a sequence of maximization problems. We use this representation to provide a simple and intuitive formulation for the population clustering procedure. We then demonstrate that the sample procedure consistently estimates its population analogue and we derive the corresponding rates of convergence. The proof conducts a careful simultaneous analysis of a collection of M‐estimation problems, whose cardinality grows together with the sample size. On the basis of the new perspectives gained from the asymptotic investigation, we propose a key post‐processing modification of the original clustering framework. We show, both theoretically and empirically, that the resulting approach can be successfully used to estimate the number of clusters in the population. Using simulated data, we compare the proposed method with existing number‐of‐clusters and modality assessment approaches and obtain encouraging results. We also demonstrate the applicability of our clustering method to the detection of cellular subpopulations in a single‐cell virology study.
Citing Literature
Number of times cited according to CrossRef: 11
- Wuyi Wang, Liangjun Su, Identifying latent group structures in nonlinear panels, Journal of Econometrics, 10.1016/j.jeconom.2020.04.003, (2020).
- Fangfang Wang, Lu Lin, Lei Liu, Kangning Wang, Estimation and clustering for partially heterogeneous single index model, Statistical Papers, 10.1007/s00362-020-01203-2, (2020).
- Huangyue Chen, Lingchen Kong, Yan Li, A Novel Convex Clustering Method for High-Dimensional Data Using Semiproximal ADMM, Mathematical Problems in Engineering, 10.1155/2020/9216351, 2020, (1-12), (2020).
- Shuyu Chu, Huijing Jiang, Zhengliang Xue, Xinwei Deng, Adaptive Convex Clustering of Generalized Linear Models With Application in Purchase Likelihood Prediction, Technometrics, 10.1080/00401706.2020.1733094, (1-13), (2020).
- Jiaying Gu, Stanislav Volgushev, Panel data quantile regression with grouped fixed effects, Journal of Econometrics, 10.1016/j.jeconom.2019.04.006, (2019).
- Eric C. Chi, Stefan Steinerberger, Recovering Trees with Convex Clustering, SIAM Journal on Mathematics of Data Science, 10.1137/18M121099X, 1, 3, (383-407), (2019).
- Michael Weylandt, undefined, 2019 IEEE Data Science Workshop (DSW), 10.1109/DSW.2019.8755599, (237-242), (2019).
- Mimi Zhang, Forward-Stagewise Clustering: An Algorithm for Convex Clustering, Pattern Recognition Letters, 10.1016/j.patrec.2019.09.014, (2019).
- Tso-Jung Yen, Solving Fused Penalty Estimation Problems via Block Splitting Algorithms, Journal of Computational and Graphical Statistics, 10.1080/10618600.2019.1660178, (1-12), (2019).
- Michael Weylandt, John Nagorski, Genevera I. Allen, Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization, Journal of Computational and Graphical Statistics, 10.1080/10618600.2019.1629943, (1-10), (2019).
- Trambak Banerjee, Gourab Mukherjee, Peter Radchenko, Feature screening in large scale cluster analysis, Journal of Multivariate Analysis, 10.1016/j.jmva.2017.08.001, 161, (191-212), (2017).




