High-dimensional variable selection in regression and classification with missing data

被引:6
作者
Gao, Qi [1 ]
Lee, Thomas C. M. [1 ]
机构
[1] Univ Calif Davis, Dept Stat, One Shields Ave, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
Adaptive lasso; Logistic regression; Low rank recovery; Matrix completion; ADAPTIVE LASSO; LIKELIHOOD; SIGNALS; NOISY; MODEL;
D O I
10.1016/j.sigpro.2016.07.014
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Variable selection for high-dimensional data problems, including both regression and classification, has been a subject of intense research activities in recent years. Many promising solutions have been proposed. However, less attention has been given to the case when some of the data are missing. This paper proposes a general approach to high-dimensional variable selection with the presence of missing data when the missing fraction can be relatively large (e.g., 50%). Both regression and classification are considered. The proposed approach iterates between two major steps: the first step uses matrix completion to impute the missing data while the second step applies adaptive lasso to the imputed data to select the significant variables. Methods are provided for choosing all the involved tuning parameters. As fast algorithms and software are widely available for matrix completion and adaptive lasso, the proposed approach is fast and straightforward to implement. Results from numerical experiments and applications to two real data sets are presented to demonstrate the efficiency and effectiveness of the approach. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 31 条
[1]   Penalized composite quasi-likelihood for ultrahigh dimensional variable selection [J].
Bradic, Jelena ;
Fan, Jianqing ;
Wang, Weiwei .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2011, 73 :325-349
[2]   OPTIMAL SELECTION OF REDUCED RANK ESTIMATORS OF HIGH-DIMENSIONAL MATRICES [J].
Bunea, Florentina ;
She, Yiyuan ;
Wegkamp, Marten H. .
ANNALS OF STATISTICS, 2011, 39 (02) :1282-1309
[3]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[4]   EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM [J].
Chen, Jiahua ;
Chen, Zehua .
STATISTICA SINICA, 2012, 22 (02) :555-574
[5]   Variable selection for multiply-imputed data with application to dioxin exposure study [J].
Chen, Qixuan ;
Wang, Sijian .
STATISTICS IN MEDICINE, 2013, 32 (21) :3646-3659
[6]   Subspace Evolution and Transfer (SET) for Low-Rank Matrix Completion [J].
Dai, Wei ;
Milenkovic, Olgica ;
Kerman, Ely .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2011, 59 (07) :3120-3132
[7]  
Fan JQ, 2009, J MACH LEARN RES, V10, P2013
[8]   Nonconcave penalized likelihood with a diverging number of parameters [J].
Fan, JQ ;
Peng, H .
ANNALS OF STATISTICS, 2004, 32 (03) :928-961
[9]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088
[10]  
Gurbuz E., 2011, 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU 2011), P42, DOI 10.1109/SIU.2011.5929582