Volume 67, Issue 2

Regularization and variable selection via the elastic net

First published: 09 March 2005
Citations: 6,063
Trevor Hastie, Department of Statistics, Stanford University, Stanford, CA 94305, USA.
E‐mail: hastie@stanford.edu

Abstract

Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the pn case. An algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.

Number of times cited according to CrossRef: 6063

  • Beyond p-Boxes and Interval-Valued Moments: Natural Next Approximations to General Imprecise Probabilities, Statistical and Fuzzy Approaches to Data Processing, with Applications to Econometrics and Other Areas, 10.1007/978-3-030-45619-1_11, (133-143), (2021).
  • Sustainable Entrepreneurship on Thailand’s SMEs, Behavioral Predictive Modeling in Economics, 10.1007/978-3-030-49728-6_28, (423-436), (2021).
  • Synergistic Drug Combination Prediction by Integrating Multiomics Data in Deep Learning Models, Translational Bioinformatics for Therapeutic Development, 10.1007/978-1-0716-0849-4_12, (223-238), (2021).
  • Feature Selection in Neural Network Solution of Inverse Problem Based on Integration of Optical Spectroscopic Methods, Advances in Neural Computation, Machine Learning, and Cognitive Research IV, 10.1007/978-3-030-60577-3_27, (234-241), (2021).
  • Variable Selection for Time-to-Event Data, Translational Bioinformatics for Therapeutic Development, 10.1007/978-1-0716-0849-4_5, (61-76), (2021).
  • MRI Morphometry in Brain Tumors: Challenges and Opportunities in Expert, Radiomic, and Deep-Learning-Based Analyses, Brain Tumors, 10.1007/978-1-0716-0856-2_14, (323-368), (2021).
  • Analysis of occupational accidents in Spain using shrinkage regression methods, Safety Science, 10.1016/j.ssci.2020.105000, 133, (105000), (2021).
  • LSTM Response Models for Direct Marketing Analytics: Replacing Feature Engineering with Deep Learning, Journal of Interactive Marketing, 10.1016/j.intmar.2020.07.002, 53, (80-95), (2021).
  • Use of advanced modelling methods to estimate radiata pine productivity indices, Forest Ecology and Management, 10.1016/j.foreco.2020.118557, 479, (118557), (2021).
  • Related Work on Geometry of Non-Convex Programs, Mathematical Theories of Machine Learning - Theory and Applications, 10.1007/978-3-030-17076-9, (39-44), (2020).
  • Development of Novel Techniques of CoCoSSC Method, Mathematical Theories of Machine Learning - Theory and Applications, 10.1007/978-3-030-17076-9, (29-33), (2020).
  • Online Discovery for Stable and Grouping Causalities in Multivariate Time Series, Mathematical Theories of Machine Learning - Theory and Applications, 10.1007/978-3-030-17076-9, (103-119), (2020).
  • Shrinkage and Sparse Estimation for High-Dimensional Linear Models, Proceedings of the Thirteenth International Conference on Management Science and Engineering Management, 10.1007/978-3-030-21248-3_11, (147-156), (2020).
  • The “Five W” of MS and EE, Model Selection and Error Estimation in a Nutshell, 10.1007/978-3-030-24359-3_2, (5-11), (2020).
  • Stress Testing Corporate Earnings of US Companies, Data-Centric Business and Applications, 10.1007/978-3-030-19069-9_14, (347-370), (2020).
  • Neutron spectrum unfolding of the multiple activation foils based on sparse representation, Annals of Nuclear Energy, 10.1016/j.anucene.2019.106947, 135, (106947), (2020).
  • Development of Gene Expression-Based Biomarkers on the nCounter® Platform for Immuno-Oncology Applications, Biomarkers for Immunotherapy of Cancer, 10.1007/978-1-4939-9773-2_13, (273-300), (2020).
  • Personal Credit Scoring via Logistic Regression with Elastic Net Penalty, Proceedings of 2019 Chinese Intelligent Systems Conference, 10.1007/978-981-32-9682-4_44, (422-428), (2020).
  • Emergence of Statistical Methodologies with the Rise of BIG Data, Women in Industrial and Systems Engineering, 10.1007/978-3-030-11866-2_2, (27-48), (2020).
  • Activation Functions, Deep Learning: Algorithms and Applications, 10.1007/978-3-030-31760-7_1, (1-30), (2020).
  • Random forests in medical image computing, Handbook of Medical Image Computing and Computer Assisted Intervention, 10.1016/B978-0-12-816176-0.00024-7, (457-480), (2020).
  • Artificial Intelligence-Based Drug Design and Discovery, Cheminformatics and its Applications [Working Title], 10.5772/intechopen.83236, (2020).
  • Utilizing Incremental Learning for the Prediction of Disease Outcomes Across Distributed Clinical Data: A Framework and a Case Study, XV Mediterranean Conference on Medical and Biological Engineering and Computing – MEDICON 2019, 10.1007/978-3-030-31635-8_98, (823-831), (2020).
  • Genetics, imaging, and cognition, Cognition and Addiction, 10.1016/B978-0-12-815298-0.00027-7, (365-377), (2020).
  • Online reduced gaussian process regression based generalized likelihood ratio test for fault detection, Journal of Process Control, 10.1016/j.jprocont.2019.11.002, 85, (30-40), (2020).
  • A machine learning framework for the analysis and prediction of catalytic activity from experimental data, Applied Catalysis B: Environmental, 10.1016/j.apcatb.2019.118257, 263, (118257), (2020).
  • Opportunities and challenges of machine learning approaches for biomarker signature identification in psychiatry, Personalized Psychiatry, 10.1016/B978-0-12-813176-3.00011-0, (117-126), (2020).
  • Multimodal modeling for personalized psychiatry, Personalized Psychiatry, 10.1016/B978-0-12-813176-3.00043-2, (521-536), (2020).
  • How Connected is the Global Sovereign Credit Risk Network?, Journal of Banking & Finance, 10.1016/j.jbankfin.2020.105761, (105761), (2020).
  • A sparse loading-based contribution method for multivariate control performance diagnosis, Journal of Process Control, 10.1016/j.jprocont.2019.12.001, 85, (199-213), (2020).
  • Ranking of environmental heat stressors for dairy cows using machine learning algorithms, Computers and Electronics in Agriculture, 10.1016/j.compag.2019.105124, 168, (105124), (2020).
  • Main concepts in machine learning, Machine Learning, 10.1016/B978-0-12-815739-8.00002-X, (21-44), (2020).
  • All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance, Chemometrics and Intelligent Laboratory Systems, 10.1016/j.chemolab.2019.103907, 196, (103907), (2020).
  • Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics, Chemometrics and Intelligent Laboratory Systems, 10.1016/j.chemolab.2019.103906, 196, (103906), (2020).
  • Linear methods for classification, Machine Learning, 10.1016/B978-0-12-815739-8.00005-5, (83-100), (2020).
  • Support vector regression, Machine Learning, 10.1016/B978-0-12-815739-8.00007-9, (123-140), (2020).
  • Exploring the use of learning techniques for relating the site index of radiata pine stands with climate, soil and physiography, Forest Ecology and Management, 10.1016/j.foreco.2019.117803, 458, (117803), (2020).
  • Modelling canopy gap probability, foliage projective cover and crown projective cover from airborne lidar metrics in Australian forests and woodlands, Remote Sensing of Environment, 10.1016/j.rse.2019.111520, 237, (111520), (2020).
  • Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer, Science Translational Medicine, 10.1126/scitranslmed.aax7533, 12, 524, (eaax7533), (2020).
  • Shrinkage Estimation Strategies in Generalised Ridge Regression Models: Low/High‐Dimension Regime, International Statistical Review, 10.1111/insr.12351, 88, 1, (229-251), (2020).
  • High dimensional regression coefficient compression model and its application, Journal of Physics: Conference Series, 10.1088/1742-6596/1437/1/012119, 1437, (012119), (2020).
  • Multi-omic serum biomarkers for prognosis of disease progression in prostate cancer, Journal of Translational Medicine, 10.1186/s12967-019-02185-y, 18, 1, (2020).
  • Predicting ordinary and severe recessions with a three-state Markov-switching dynamic factor model, International Journal of Forecasting, 10.1016/j.ijforecast.2019.09.005, (2020).
  • Abnormal Serum Sodium is Associated With Increased Mortality Among Unselected Cardiac Intensive Care Unit Patients, Journal of the American Heart Association, 10.1161/JAHA.119.014140, 9, 2, (2020).
  • Multiset sparse partial least squares path modeling for high dimensional omics data analysis, BMC Bioinformatics, 10.1186/s12859-019-3286-3, 21, 1, (2020).
  • Pain-free resting-state functional brain connectivity predicts individual pain sensitivity, Nature Communications, 10.1038/s41467-019-13785-z, 11, 1, (2020).
  • Forecasting the real prices of crude oil using robust regression models with regularization constraints, Energy Economics, 10.1016/j.eneco.2020.104683, (104683), (2020).
  • Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market, Journal of Behavioral and Experimental Finance, 10.1016/j.jbef.2020.100272, (100272), (2020).
  • Forecasting stock returns with model uncertainty and parameter instability, Journal of Applied Econometrics, 10.1002/jae.2747, 35, 5, (629-644), (2020).
  • Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data, Physics of Fluids, 10.1063/1.5136351, 32, 1, (015113), (2020).
  • Compound Regularization of Full-Waveform Inversion for Imaging Piecewise Media, IEEE Transactions on Geoscience and Remote Sensing, 10.1109/TGRS.2019.2944464, 58, 2, (1192-1204), (2020).
  • Anthropometric clothing measurements from 3D body scans, Machine Vision and Applications, 10.1007/s00138-019-01054-4, 31, 1, (2020).
  • Applicability of machine learning to a crack model in concrete bridges, Computer-Aided Civil and Infrastructure Engineering, 10.1111/mice.12532, 35, 8, (775-792), (2020).
  • EEG spectral power, but not theta/beta ratio, is a neuromarker for adult ADHD, European Journal of Neuroscience, 10.1111/ejn.14645, 51, 10, (2095-2109), (2020).
  • ncHMR detector: a computational framework to systematically reveal non-classical functions of histone modification regulators, Genome Biology, 10.1186/s13059-020-01953-0, 21, 1, (2020).
  • Impact of Dental Students’ Faculty Group Leader, Intended Postgraduate Training, and Clinic Schedule on Their Clinical Performance: A Retrospective Study at a U.S. Dental School, Journal of Dental Education, 10.21815/JDE.019.165, 84, 1, (34-43), (2020).
  • Comparison of 2 new real‐time polymerase chain reaction–based urinary markers in the follow‐up of patients with non–muscle‐invasive bladder cancer, Cancer Cytopathology, 10.1002/cncy.22246, 128, 5, (341-347), (2020).
  • Neonatal morphometric similarity mapping for predicting brain age and characterizing neuroanatomic variation associated with preterm birth, NeuroImage: Clinical, 10.1016/j.nicl.2020.102195, (102195), (2020).
  • Toward Detecting Illegal Transactions on Bitcoin Using Machine-Learning Methods, Blockchain and Trustworthy Systems, 10.1007/978-981-15-2777-7_42, (520-533), (2020).
  • Creation of novel large dataset comprising several granulation methods and the prediction of tablet properties from critical material attributes and critical process parameters using regularized linear regression models including interaction terms, International Journal of Pharmaceutics, 10.1016/j.ijpharm.2020.119083, (119083), (2020).
  • Scalable Bayesian variable selection regression models for count data, Flexible Bayesian Regression Modelling, 10.1016/B978-0-12-815862-3.00015-9, (187-219), (2020).
  • Ensemble averaging based assessment of spatiotemporal variations in ambient PM2.5 concentrations over Delhi, India, during 2010–2016, Atmospheric Environment, 10.1016/j.atmosenv.2020.117309, (117309), (2020).
  • DeepFMRI: End-to-end deep learning for functional connectivity and classification of ADHD using fMRI, Journal of Neuroscience Methods, 10.1016/j.jneumeth.2019.108506, (108506), (2020).
  • Machine Learning for Digital Front‐End, Machine Learning for Future Wireless Communications, 10.1002/9781119562306, (327-381), (2020).
  • Deep learning-based single-shot prediction of differential effects of anti-VEGF treatment in patients with diabetic macular edema, Biomedical Optics Express, 10.1364/BOE.379150, 11, 2, (1139), (2020).
  • Machine learning to predict early recurrence after oesophageal cancer surgery, BJS (British Journal of Surgery), 10.1002/bjs.11461, 107, 8, (1042-1052), (2020).
  • Cluster analysis and prediction of residential peak demand profiles using occupant activity data, Applied Energy, 10.1016/j.apenergy.2019.114246, 260, (114246), (2020).
  • Simple Adaptive Rules Describe Fishing Behaviour Better than Perfect Rationality in the US West Coast Groundfish Fishery, Ecological Economics, 10.1016/j.ecolecon.2019.106449, 169, (106449), (2020).
  • Neurodevelopment and Neurobehavioral Disorders in Relation to Developmental Exposures, Health Impacts of Developmental Exposure to Environmental Chemicals, 10.1007/978-981-15-0520-1_7, (153-174), (2020).
  • Multi-parameter multiplicative regularization: An application to force reconstruction problems, Journal of Sound and Vibration, 10.1016/j.jsv.2019.115135, 469, (115135), (2020).
  • Statistical and Machine Learning Methods for eQTL Analysis, eQTL Analysis, 10.1007/978-1-0716-0026-9_7, (87-104), (2020).
  • Digital Twin: Values, Challenges and Enablers From a Modeling Perspective, IEEE Access, 10.1109/ACCESS.2020.2970143, 8, (21980-22012), (2020).
  • A new multiple kernel-based regularization method for identification of delay linear dynamic systems, Chemometrics and Intelligent Laboratory Systems, 10.1016/j.chemolab.2020.103971, (103971), (2020).
  • Operational state detection in hydrocyclones with convolutional neural networks and transfer learning, Minerals Engineering, 10.1016/j.mineng.2020.106211, 149, (106211), (2020).
  • Dynamic Prediction in Clinical Survival Analysis Using Temporal Convolutional Networks, IEEE Journal of Biomedical and Health Informatics, 10.1109/JBHI.2019.2929264, 24, 2, (424-436), (2020).
  • Climate change contributes to widespread declines among bumble bees across continents, Science, 10.1126/science.aax8591, 367, 6478, (685-688), (2020).
  • Discovery of a putative blood-based protein signature associated with response to ALK tyrosine kinase inhibition, Clinical Proteomics, 10.1186/s12014-020-9269-6, 17, 1, (2020).
  • Factor-adjusted regularized model selection, Journal of Econometrics, 10.1016/j.jeconom.2020.01.006, (2020).
  • Evaluating different sparsity measures for resolving LC/GC-MS data in the context of multivariate curve resolution, Chemometrics and Intelligent Laboratory Systems, 10.1016/j.chemolab.2020.104004, 200, (104004), (2020).
  • Fault diagnosis of heating systems using multivariate feature extraction based machine learning classifiers, Journal of Building Engineering, 10.1016/j.jobe.2020.101221, (101221), (2020).
  • The construction of a composite index for general satisfaction in Turkey and the investigation of its determinants, Socio-Economic Planning Sciences, 10.1016/j.seps.2020.100811, (100811), (2020).
  • Evaluating associations between early pregnancy trace elements mixture and 2nd trimester gestational glucose levels: A comparison of three statistical approaches, International Journal of Hygiene and Environmental Health, 10.1016/j.ijheh.2019.113446, 224, (113446), (2020).
  • Using machine learning to model problematic smartphone use severity: The significant role of fear of missing out, Addictive Behaviors, 10.1016/j.addbeh.2019.106261, 103, (106261), (2020).
  • Graph Structured Sparse Subset Selection, Information Sciences, 10.1016/j.ins.2019.12.086, (2020).
  • Adaptive sparse and dense hybrid representation with nonconvex optimization, Frontiers of Computer Science, 10.1007/s11704-019-7200-y, 14, 4, (2020).
  • A hybrid analytical–numerical algorithm for determining the neuronal current via electroencephalography, Journal of The Royal Society Interface, 10.1098/rsif.2019.0831, 17, 163, (20190831), (2020).
  • Comparison of forecast models of production of dairy cows combining animal and diet parameters, Computers and Electronics in Agriculture, 10.1016/j.compag.2020.105258, 170, (105258), (2020).
  • Hierarchical Rough-to-Fine Model for Infant Age Prediction Based on Cortical Features, IEEE Journal of Biomedical and Health Informatics, 10.1109/JBHI.2019.2897020, 24, 1, (214-225), (2020).
  • Optimising network modelling methods for fMRI, NeuroImage, 10.1016/j.neuroimage.2020.116604, (116604), (2020).
  • Promotion of the Warburg effect is associated with poor benefit from adjuvant chemotherapy in colorectal cancer, Cancer Science, 10.1111/cas.14275, 111, 2, (658-666), (2020).
  • Improving spatial predictions of animal resource selection to guide conservation decision making, Ecology, 10.1002/ecy.2953, 101, 3, (2020).
  • Role of absence in academic success: an analysis using visualization tools, Smart Learning Environments, 10.1186/s40561-019-0112-3, 7, 1, (2020).
  • Probabilistic photovoltaic power forecasting model based on deterministic forecasts, E3S Web of Conferences, 10.1051/e3sconf/202015201003, 152, (01003), (2020).
  • Bayesian differential analysis of gene regulatory networks exploiting genetic perturbations, BMC Bioinformatics, 10.1186/s12859-019-3314-3, 21, 1, (2020).
  • Deep graph embedding for prioritizing synergistic anticancer drug combinations, Computational and Structural Biotechnology Journal, 10.1016/j.csbj.2020.02.006, (2020).
  • Interval combination iterative optimization approach coupled with SIMPLS (ICIOA-SIMPLS) for quantitative analysis of surface-enhanced Raman scattering (SERS) spectra, Analytica Chimica Acta, 10.1016/j.aca.2020.01.018, (2020).
  • A Bayesian Framework for Robust Quantitative Trait Locus Mapping and Outlier Detection, The International Journal of Biostatistics, 10.1515/ijb-2019-0038, 0, 0, (2020).
  • Trend Models and Estimation, Random Fields for Spatial Data Modeling, 10.1007/978-94-024-1918-4_2, (41-81), (2020).
  • Targeted next-generation sequencing of 565 neuro-oncology patients at UCLA: A single-institution experience, Neuro-Oncology Advances, 10.1093/noajnl/vdaa009, 2, 1, (2020).
  • Structured penalized regression for drug sensitivity prediction, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12400, 69, 3, (525-545), (2020).
  • See more