How to Make Model-free Feature Screening Approaches for Full Data Applicable to the Case of Missing Response?

被引:14
作者
Wang, Qihua [1 ,2 ]
Li, Yongjin [1 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
[2] Shenzhen Univ, Inst Stat Sci, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
borrowing missingness information; missing data; ultrahigh dimensionality; variable screening; GENERALIZED LINEAR-MODELS; VARIABLE SELECTION; EMPIRICAL LIKELIHOOD; ORACLE PROPERTIES; ALGORITHM; LASSO;
D O I
10.1111/sjos.12290
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is quite a challenge to develop model-free feature screening approaches for missing response problems because the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops some novel methods by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh-dimensional covariates with full data can be applied to missing response case. The first method is the so-called missing indicator imputation screening, which is developed by proving that the set of the active predictors of interest for the response is a subset of the active predictors for the product of the response and missingness indicator under some mild conditions. As an alternative, another method called Venn diagram-based approach is also developed. The sure screening property is proven for both methods. It is shown that the complete case analysis can also keep the sure screening property of any feature screening approach with sure screening property.
引用
收藏
页码:324 / 346
页数:23
相关论文
共 33 条
[11]  
Hardle W., 2004, NONPARAMETRIC SEMIPA, DOI DOI 10.1007/978-3-642-17146-8
[12]   QUANTITATIVE TRAIT LOCUS EFFECTS AND ENVIRONMENTAL INTERACTION IN A SAMPLE OF NORTH-AMERICAN BARLEY GERM PLASM [J].
HAYES, PM ;
LIU, BH ;
KNAPP, SJ ;
CHEN, F ;
JONES, B ;
BLAKE, T ;
FRANCKOWIAK, J ;
RASMUSSON, D ;
SORRELLS, M ;
ULLRICH, SE ;
WESENBERG, D ;
KLEINHOFS, A .
THEORETICAL AND APPLIED GENETICS, 1993, 87 (03) :392-401
[13]   QUANTILE-ADAPTIVE MODEL-FREE VARIABLE SCREENING FOR HIGH-DIMENSIONAL HETEROGENEOUS DATA [J].
He, Xuming ;
Wang, Lan ;
Hong, Hyokyoung Grace .
ANNALS OF STATISTICS, 2013, 41 (01) :342-369
[14]   Missing-data methods for generalized linear models: A comparative review [J].
Ibrahim, JG ;
Chen, MH ;
Lipsitz, SR ;
Herring, AH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (469) :332-346
[15]   Model Selection Criteria for Missing-Data Problems Using the EM Algorithm [J].
Ibrahim, Joseph G. ;
Zhu, Hongtu ;
Tang, Niansheng .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (484) :1648-1658
[16]   The E-MS Algorithm: Model Selection With Incomplete Data [J].
Jiang, Jiming ;
Thuan Nguyen ;
Rao, J. Sunil .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (511) :1136-1147
[17]   Feature Screening via Distance Correlation Learning [J].
Li, Runze ;
Zhong, Wei ;
Zhu, Liping .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (499) :1129-1139
[18]  
Little R. J., 2019, STAT ANAL MISSING DA, V793
[19]   Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates [J].
Liu, Jingyuan ;
Li, Runze ;
Wu, Rongling .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (505) :266-274
[20]   SFP genotyping from Affymetrix arrays is robust but largely detects cis-acting expression regulators [J].
Luo, Z. W. ;
Potokina, E. ;
Druka, A. ;
Wise, R. ;
Waugh, R. ;
Kearsey, M. J. .
GENETICS, 2007, 176 (02) :789-800