High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection

被引:85
作者
Emmert-Streib, Frank [1 ,2 ,3 ]
Dehmer, Matthias [4 ,5 ,6 ]
机构
[1] Tampere Univ, Predict Soc, Tampere 33720, Finland
[2] Tampere Univ, Fac Informat Technol & Commun Sci, Data Analyt Lab, Tampere 33720, Finland
[3] Inst Biosci & Med Technol, Tampere 33520, Finland
[4] Univ Appl Sci Upper Austria, Inst Intelligent Prod, Fac Management, Steyr Campus, A-4400 Steyr, Austria
[5] UMIT, Dept Mechatron & Biomed Comp Sci, A-6060 Hall In Tirol, Austria
[6] Nankai Univ, Coll Comp & Control Engn, Tianjin 300071, Peoples R China
来源
MACHINE LEARNING AND KNOWLEDGE EXTRACTION | 2019年 / 1卷 / 01期
基金
奥地利科学基金会;
关键词
machine learning; statistics; regression models; LASSO; regularization; high-dimensional data; data science; shrinkage; feature selection; VARIABLE SELECTION; DANTZIG SELECTOR; STATISTICAL ESTIMATION; PENALIZED REGRESSION; ADAPTIVE LASSO; EXPRESSION; DESIGN; LARGER;
D O I
10.3390/make1010021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regression models are a form of supervised learning methods that are important for machine learning, statistics, and general data science. Despite the fact that classical ordinary least squares (OLS) regression models have been known for a long time, in recent years there are many new developments that extend this model significantly. Above all, the least absolute shrinkage and selection operator (LASSO) model gained considerable interest. In this paper, we review general regression models with a focus on the LASSO and extensions thereof, including the adaptive LASSO, elastic net, and group LASSO. We discuss the regularization terms responsible for inducing coefficient shrinkage and variable selection leading to improved performance metrics of these regression models. This makes these modern, computational regression models valuable tools for analyzing high-dimensional problems.
引用
收藏
页码:359 / 383
页数:25
相关论文
共 79 条
[1]   Class-imbalanced subsampling lasso algorithm for discovering adverse drug reactions [J].
Ahmed, Ismail ;
Pariente, Antoine ;
Tubert-Bitter, Pascale .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (03) :785-797
[2]  
[Anonymous], 2009, ELEMENTS STAT LEARNI
[3]  
[Anonymous], 2006, J ROYAL STAT SOC B
[4]   Regularization in statistics [J].
Bickel, Peter J. ;
Li, Bo .
TEST, 2006, 15 (02) :271-303
[5]   Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump [J].
Bovet, Alexandre ;
Morone, Flaviano ;
Makse, Hernan A. .
SCIENTIFIC REPORTS, 2018, 8
[6]   BETTER SUBSET REGRESSION USING THE NONNEGATIVE GARROTE [J].
BREIMAN, L .
TECHNOMETRICS, 1995, 37 (04) :373-384
[7]  
Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[8]   Novel 18-gene signature for predicting relapse in ER-positive, HER2-negative breast cancer [J].
Buus, Richard ;
Yeo, Belinda ;
Brentnall, Adam R. ;
Klintman, Marie ;
Cheang, Maggie Chon U. ;
Khabra, Komel ;
Sestak, Ivana ;
Gao, Qiong ;
Cuzick, Jack ;
Dowsett, Mitch .
BREAST CANCER RESEARCH, 2018, 20
[9]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
[10]   Understanding the paradigm shift to computational social science in the presence of big data [J].
Chang, Ray M. ;
Kauffman, Robert J. ;
Kwon, YoungOk .
DECISION SUPPORT SYSTEMS, 2014, 63 :67-80