Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space

被引:49
|
作者
Luo, Shan [1 ]
Chen, Zehua [2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Math, Shanghai 200030, Peoples R China
[2] Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore 117548, Singapore
关键词
Extended BIC; Oracle property; Selection consistency; Sparse high-dimensional linear models; NONCONCAVE PENALIZED LIKELIHOOD; ORTHOGONAL MATCHING PURSUIT; VARIABLE SELECTION; MODEL SELECTION; SIGNAL RECOVERY; ORACLE PROPERTIES; ADAPTIVE LASSO; LINEAR-MODELS; REGRESSION; SHRINKAGE;
D O I
10.1080/01621459.2013.877275
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article, we propose a method called sequential Lasso (SLasso) for feature selection in sparse high-dimensional linear models. The SLasso selects features by sequentially solving partially penalized least squares problems where the features selected in earlier steps are not penalized. The SLasso uses extended BIC (EBIC) as the stopping rule. The procedure stops when EBIC reaches a minimum. The asymptotic properties of SLasso are considered when the dimension of the feature space is ultra high and the number of relevant feature diverges. We show that, with probability converging to 1, the SLasso first selects all the relevant features before any irrelevant features can be selected, and that the EBIC decreases until it attains the minimum at the model consisting of exactly all the relevant features and then begins to increase. These results establish the selection consistency of SLasso. The SLasso estimators of the final model are ordinary least squares estimators. The selection consistency implies the oracle property of SLasso. The asymptotic distribution of the SLasso estimators with diverging number of relevant features is provided. The SLasso is compared with other methods by simulation studies, which demonstrates that SLasso is a desirable approach having an edge over the other methods. The SLasso together with the other methods are applied to a microarray data for mapping disease genes. Supplementary materials for this article are available online.
引用
收藏
页码:1229 / 1240
页数:12
相关论文
共 50 条
  • [41] Optimal Feature Selection in High-Dimensional Discriminant Analysis
    Kolar, Mladen
    Liu, Han
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (02) : 1063 - 1083
  • [42] Nonnegative adaptive lasso for ultra-high dimensional regression models and a two-stage method applied in financial modeling
    Yang, Yuehan
    Wu, Lan
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2016, 174 : 52 - 67
  • [43] Ultra-high dimensional variable selection for doubly robust causal inference
    Tang, Dingke
    Kong, Dehan
    Pan, Wenliang
    Wang, Linbo
    BIOMETRICS, 2023, 79 (02) : 903 - 914
  • [44] A study on tuning parameter selection for the high-dimensional lasso
    Homrighausen, Darren
    McDonald, Daniel J.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (15) : 2865 - 2892
  • [45] Sparse group LASSO based uncertain feature selection
    Xie, Zongxia
    Xu, Yong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (02) : 201 - 210
  • [46] Improved Feature Selection by Incorporating Gene Similarity into the LASSO
    Gillies, C. E.
    Gao, X.
    Patel, N. V.
    Siadat, M-R.
    Wilson, G. D.
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 41 - 48
  • [47] On the Role of Feature Space Granulation in Feature Selection Processes
    Grzegorowski, Marek
    Janusz, Andrzej
    Slezak, Dominik
    Szczuka, Marcin
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1806 - 1815
  • [48] Projective inference in high-dimensional problems: Prediction and feature selection
    Piironen, Juho
    Paasiniemi, Markus
    Vehtari, Aki
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 2155 - 2197
  • [49] A general framework of nonparametric feature selection in high-dimensional data
    Yu, Hang
    Wang, Yuanjia
    Zeng, Donglin
    BIOMETRICS, 2023, 79 (02) : 951 - 963
  • [50] Reproducible Feature Selection for High-Dimensional Measurement Error Models
    Zhou, Xin
    Li, Yang
    Zheng, Zemin
    Wu, Jie
    Zhang, Jiarui
    INFORMS JOURNAL ON COMPUTING, 2024,