Asymptotic behaviour of the posterior distribution in overfitted mixture models
Abstract
Summary. We study the asymptotic behaviour of the posterior distribution in a mixture model when the number of components in the mixture is larger than the true number of components: a situation which is commonly referred to as an overfitted mixture. We prove in particular that quite generally the posterior distribution has a stable and interesting behaviour, since it tends to empty the extra components. This stability is achieved under some restriction on the prior, which can be used as a guideline for choosing the prior. Some simulations are presented to illustrate this behaviour.
Citing Literature
Number of times cited according to CrossRef: 64
- Lam Si Tung Ho, Binh T. Nguyen, Vu Dinh, Duy Nguyen, Posterior concentration and fast convergence rates for generalized Bayesian learning, Information Sciences, 10.1016/j.ins.2020.05.138, (2020).
- Nicole White, Zoé van Havre, Judith Rousseau, Kerrie L. Mengersen, Bayesian Spike Sorting: Parametric and Nonparametric Multivariate Gaussian Mixture Models, Case Studies in Applied Bayesian Data Science, 10.1007/978-3-030-42553-1_8, (215-227), (2020).
- Rick Vliet, Ruud W. Selles, Eleni‐Rosalina Andrinopoulou, Rinske Nijland, Gerard M. Ribbers, Maarten A. Frens, Carel Meskers, Gert Kwakkel, Predicting Upper Limb Motor Impairment Recovery after Stroke: A Mixture Model, Annals of Neurology, 10.1002/ana.25679, 87, 3, (383-393), (2020).
- Jan Povala, Seppo Virtanen, Mark Girolami, Burglary in London: insights from statistical heterogeneous spatial point processes, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12431, 69, 5, (1067-1090), (2020).
- José J. Quinlan, Fernando A. Quintana, Garritt L. Page, On a class of repulsive mixture models, TEST, 10.1007/s11749-020-00726-y, (2020).
- Nathan Cunningham, Jim E. Griffin, David L. Wild, ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification, Advances in Data Analysis and Classification, 10.1007/s11634-020-00401-y, (2020).
- H. Rezazadeh, F. Eskandari, M. Bameni Moghadam, E. Ormoz, Variable selection in finite mixture of generalized estimating equations, Communications in Statistics - Simulation and Computation, 10.1080/03610918.2019.1711406, (1-15), (2020).
- Sirio Legramanti, Daniele Durante, David B Dunson, Bayesian cumulative shrinkage for infinite factorizations, Biometrika, 10.1093/biomet/asaa008, (2020).
- Antonio Lijoi, Igor Prünster, Tommaso Rigon, The Pitman–Yor multinomial process for mixture modelling, Biometrika, 10.1093/biomet/asaa030, (2020).
- Federico Ferrari, David B. Dunson, Bayesian Factor Analysis for Inference on Interactions, Journal of the American Statistical Association, 10.1080/01621459.2020.1745813, (1-12), (2020).
- Eleni-Rosalina Andrinopoulou, Kazem Nasserinejad, Rhonda Szczesniak, Dimitris Rizopoulos, Integrating latent classes in the Bayesian shared parameter joint model of longitudinal and survival outcomes, Statistical Methods in Medical Research, 10.1177/0962280220924680, (096228022092468), (2020).
- Briana J K Stephenson, Daniela Sotres-Alvarez, Anna-Maria Siega-Riz, Yasmin Mossavar-Rahmani, Martha L Daviglus, Linda Van Horn, Amy H Herring, Jianwen Cai, Empirically Derived Dietary Patterns Using Robust Profile Clustering in the Hispanic Community Health Study/Study of Latinos, The Journal of Nutrition, 10.1093/jn/nxaa208, (2020).
- Florence Forbes, Alexis Arnaud, Benjamin Lemasson, Emmanuel Barbier, Component Elimination Strategies to Fit Mixtures of Multiple Scale Distributions, Statistics and Data Science, 10.1007/978-981-15-1960-4_6, (81-95), (2019).
- Clara Grazian, Cristiano Villa, Brunero Liseo, On a loss-based prior for the number of components in mixture models, Statistics & Probability Letters, 10.1016/j.spl.2019.108656, (108656), (2019).
- Nhat Ho, XuanLong Nguyen, Singularity Structures and Impacts on Parameter Estimation in Finite Mixtures of Distributions, SIAM Journal on Mathematics of Data Science, 10.1137/18M122947X, 1, 4, (730-758), (2019).
- Shyamalendu Sinha, Jeffrey D. Hart, Estimating the mean and variance of a high-dimensional normal distribution using a mixture prior, Computational Statistics & Data Analysis, 10.1016/j.csda.2019.04.006, (2019).
- Jairo Fúquene, Mark Steel, David Rossell, On choosing mixture components via non‐local priors, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 10.1111/rssb.12333, 81, 5, (809-837), (2019).
- Manuele Leonelli, Dani Gamerman, Semiparametric bivariate modelling with flexible extremal dependence, Statistics and Computing, 10.1007/s11222-019-09878-w, (2019).
- Ariane Kehlbacher, Chittur Srinivasan, Rachel McCloy, Richard Tiffin, Modelling preference heterogeneity using a Bayesian finite mixture of Almost Ideal Demand Systems, European Review of Agricultural Economics, 10.1093/erae/jbz002, (2019).
- Russell C. H. Cheng, Christine S. M. Currie, Input modelling for multimodal data, Journal of the Operational Research Society, 10.1080/01605682.2019.1609887, (1-15), (2019).
- Panagiotis Papastamoulis, Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components, Statistics and Computing, 10.1007/s11222-019-09891-z, (2019).
- Briana J. K. Stephenson, Amy H. Herring, Andrew Olshan, Robust Clustering With Subpopulation-Specific Deviations, Journal of the American Statistical Association, 10.1080/01621459.2019.1611583, (1-29), (2019).
- Zhengwu Zhang, Maxime Descoteaux, David B. Dunson, Nonparametric Bayes Models of Fiber Curves Connecting Brain Regions, Journal of the American Statistical Association, 10.1080/01621459.2019.1574582, (1-23), (2019).
- Prasenjit Ghosh, Debdeep Pati, Anirban Bhattacharya, Posterior Contraction Rates for Stochastic Block Models, Sankhya A, 10.1007/s13171-019-00180-5, (2019).
- Tommaso Rigon, Daniele Durante, Nicola Torelli, Bayesian semiparametric modelling of contraceptive behaviour in India via sequential logistic regressions, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/rssa.12361, 182, 1, (225-247), (2018).
- Junxian Geng, Anirban Bhattacharya, Debdeep Pati, Probabilistic Community Detection With Unknown Number of Communities, Journal of the American Statistical Association, 10.1080/01621459.2018.1458618, 114, 526, (893-905), (2018).
- Johan Dahlin, Adrian Wills, Brett Ninness, Sparse Bayesian ARX models with flexible noise distributions, IFAC-PapersOnLine, 10.1016/j.ifacol.2018.09.085, 51, 15, (25-30), (2018).
- Gertraud Malsiner-Walli, Daniela Pauger, Helga Wagner, Effect fusion using model-based clustering, Statistical Modelling, 10.1177/1471082X17739058, 18, 2, (175-196), (2018).
- Clara Grazian, Christian P. Robert, Jeffreys priors for mixture estimation: Properties and alternatives, Computational Statistics & Data Analysis, 10.1016/j.csda.2017.12.005, 121, (149-163), (2018).
- Panagiotis Papastamoulis, Overfitting Bayesian mixtures of factor analyzers with an unknown number of components, Computational Statistics & Data Analysis, 10.1016/j.csda.2018.03.007, 124, (220-234), (2018).
- Massimiliano Russo, Daniele Durante, Bruno Scarpa, Bayesian inference on group differences in multivariate categorical data, Computational Statistics & Data Analysis, 10.1016/j.csda.2018.04.010, 126, (136-149), (2018).
- Davide Vidotto, Jeroen K. Vermunt, Katrijn Van Deun, Bayesian Latent Class Models for the Multiple Imputation of Categorical Data, Methodology, 10.1027/1614-2241/a000146, 14, 2, (56-68), (2018).
- Christian P. Robert, Beyond the Bayes Factor, A New Bayesian Paradigm for Handling Hypothesis Testing, Statistics and its Applications, 10.1007/978-981-13-1223-6_12, (133-137), (2018).
- Kaniav Kamary, Jeong Eun Lee, Christian P. Robert, Weakly Informative Reparameterizations for Location-Scale Mixtures, Journal of Computational and Graphical Statistics, 10.1080/10618600.2018.1438900, 27, 4, (836-848), (2018).
- Daniele Durante, David B. Dunson, Joshua T. Vogelstein, Rejoinder: Nonparametric Bayes Modeling of Populations of Networks, Journal of the American Statistical Association, 10.1080/01621459.2017.1395643, 112, 520, (1547-1552), (2018).
- Sylvia Frühwirth-Schnatter, Gertraud Malsiner-Walli, From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering, Advances in Data Analysis and Classification, 10.1007/s11634-018-0329-y, (2018).
- Zihang Lu, Wendy Lou, Shape invariant mixture model for clustering non-linear longitudinal growth trajectories, Statistical Methods in Medical Research, 10.1177/0962280218815301, (096228021881530), (2018).
- Davide Vidotto, Jeroen K. Vermunt, Katrijn van Deun, Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data, Journal of Educational and Behavioral Statistics, 10.3102/1076998618769871, (107699861876987), (2018).
- Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot, Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 10.1109/TASLP.2017.2740001, 25, 10, (1997-2012), (2017).
- Valeria Edefonti, Giovanni Parmigiani, Combinatorial Mixtures of Multiparameter Distributions: An Application to Bivariate Data, The International Journal of Biostatistics, 10.1515/ijb-2015-0064, 13, 1, (2017).
- Daniele Durante, A note on the multiplicative gamma process, Statistics & Probability Letters, 10.1016/j.spl.2016.11.014, 122, (198-204), (2017).
- Mathias Drton, Martyn Plummer, A Bayesian information criterion for singular models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 10.1111/rssb.12187, 79, 2, (323-380), (2017).
- Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün, Identifying Mixtures of Mixtures Using Bayesian Estimation, Journal of Computational and Graphical Statistics, 10.1080/10618600.2016.1200472, 26, 2, (285-295), (2017).
- Daniele Durante, David B. Dunson, Joshua T. Vogelstein, Nonparametric Bayes Modeling of Populations of Networks, Journal of the American Statistical Association, 10.1080/01621459.2016.1219260, 112, 520, (1516-1530), (2017).
- Kazem Nasserinejad, Joost van Rosmalen, Wim de Kort, Emmanuel Lesaffre, Comparison of Criteria for Choosing the Number of Classes in Bayesian Finite Mixture Models, PLOS ONE, 10.1371/journal.pone.0168838, 12, 1, (e0168838), (2017).
- Evelina Gabasova, John Reid, Lorenz Wernisch, Clusternomics: Integrative context-dependent clustering for heterogeneous datasets, PLOS Computational Biology, 10.1371/journal.pcbi.1005781, 13, 10, (e1005781), (2017).
- Daniele Durante, Sally Paganin, Bruno Scarpa, David B. Dunson, Bayesian modelling of networks in complex business intelligence problems, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12168, 66, 3, (555-580), (2016).
- Jim E. Griffin, Flexibly Modelling Volatility and Jumps Using Realised and Bi-Power Variation, SSRN Electronic Journal, 10.2139/ssrn.2760901, (2016).
- Christian P. Robert, The expected demise of the Bayes factor, Journal of Mathematical Psychology, 10.1016/j.jmp.2015.08.002, 72, (33-37), (2016).
- Han Li, Chun Li, Jie Hu, Xiaodan Fan, A Resampling Based Clustering Algorithm for Replicated Gene Expression Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10.1109/TCBB.2015.2403320, 12, 6, (1295-1303), (2015).
- J. Lee, Y. Fan, S.A. Sisson, Bayesian threshold selection for extremal models using measures of surprise, Computational Statistics & Data Analysis, 10.1016/j.csda.2014.12.004, 85, (84-99), (2015).
- Zoé van Havre, Nicole White, Judith Rousseau, Kerrie Mengersen, Overfitting Bayesian Mixture Models with an Unknown Number of Components, PLOS ONE, 10.1371/journal.pone.0131739, 10, 7, (e0131739), (2015).
- Tanzy MT Love, Sally W Thurston, Philip W Davidson, Finding vulnerable subpopulations in the Seychelles Child Development Study: effect modification with latent groups, Statistical Methods in Medical Research, 10.1177/0962280214560044, 26, 2, (809-822), (2014).
- Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün, Model-based clustering based on sparse finite Gaussian mixtures, Statistics and Computing, 10.1007/s11222-014-9500-2, 26, 1-2, (303-324), (2014).
- Arthur White, Jason Wyse, Thomas Brendan Murphy, Bayesian variable selection for latent class analysis using a collapsed Gibbs sampler, Statistics and Computing, 10.1007/s11222-014-9542-5, 26, 1-2, (511-527), (2014).
- Christine Keribin, Vincent Brault, Gilles Celeux, Gérard Govaert, Estimation and selection for the latent block model on categorical data, Statistics and Computing, 10.1007/s11222-014-9472-2, 25, 6, (1201-1216), (2014).
- Hans-Ulrich Klein, Martin Schäfer, Bo T. Porse, Marie S. Hasemann, Katja Ickstadt, Martin Dugas, Integrative analysis of histone ChIP-seq and transcription data using Bayesian mixture models, Bioinformatics, 10.1093/bioinformatics/btu003, 30, 8, (1154-1162), (2014).
- Elena A. Erosheva, Ross L. Matsueda, Donatello Telesca, Breaking Bad: Two Decades of Life-Course Data Analysis in Criminology, Developmental Psychology, and Beyond, Annual Review of Statistics and Its Application, 10.1146/annurev-statistics-022513-115701, 1, 1, (301-332), (2014).
- Manjula Algama, Christopher Oldmeadow, Edward Tasker, Kerrie Mengersen, Jonathan M. Keith, Drosophila 3′ UTRs Are More Complex than Protein-Coding Sequences, PLoS ONE, 10.1371/journal.pone.0097336, 9, 5, (e97336), (2014).
- Jean‐Michel Marin, Natesh S. Pillai, Christian P. Robert, Judith Rousseau, Relevant statistics for Bayesian model choice, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 10.1111/rssb.12056, 76, 5, (833-859), (2013).
- P. G. Bissiri, A. Ongaro, S. G. Walker, Species sampling models: consistency for the number of species, Biometrika, 10.1093/biomet/ast006, 100, 3, (771-777), (2013).
- Jixin Wang, Zhenyu Wang, Chao Yang, Naixiang Wang, Xiangjun Yu, Optimization of the number of components in the mixed model using multi-criteria decision-making, Applied Mathematical Modelling, 10.1016/j.apm.2011.11.053, 36, 9, (4227-4240), (2012).
- Paul Kirk, Jim E. Griffin, Richard S. Savage, Zoubin Ghahramani, David L. Wild, Bayesian correlated clustering to integrate multiple datasets, Bioinformatics, 10.1093/bioinformatics/bts595, 28, 24, (3290-3297), (2012).
- Sylvia Frühwirth-Schnatter, Panel data analysis: a survey on model-based clustering of time series, Advances in Data Analysis and Classification, 10.1007/s11634-011-0100-0, 5, 4, (251-280), (2011).




