Feature screening for ultrahigh dimensional categorical data with covariates missing at random

被引:9
作者
Ni, Lyu [1 ]
Fang, Fang [2 ]
Shao, Jun [2 ,3 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai, Peoples R China
[2] East China Normal Univ, Sch Stat, Key Lab Adv Theory & Applicat Stat & Data Sci MOE, Shanghai, Peoples R China
[3] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
基金
中国国家自然科学基金;
关键词
Feature screening; Missing at random; Missing covariate; Pearson Chi-Square statistic; Sure screening property; VARIABLE SELECTION; KOLMOGOROV FILTER; MODEL SELECTION; REGRESSION;
D O I
10.1016/j.csda.2019.106824
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most existing feature screening methods assume that data are fully observed. It is quite a challenge to develop screening methods for incomplete data since the traditional missing data analysis techniques cannot be directly applied to ultrahigh dimensional case. A two-step model-free feature screening procedure for ultrahigh dimensional categorical data when some covariate values are missing at random is developed. For each covariate with missing data, the first step screens out the variables in the unspecified propensity function. In the second step, screening statistics such as the adjusted Pearson Chi-Square statistics can be calculated by leveraging the variables obtained in the first step and the special structure of categorical data. Sure screening properties are established for the proposed method. Finite sample performance is investigated by simulation studies and a real data example. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 26 条
  • [1] Cheng G., 2018, STAT DECIS, V500, P64
  • [2] Model-Free Feature Screening for Ultrahigh Dimenssional Discriminant Analysis
    Cui, Hengjian
    Li, Runze
    Zhong, Wei
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (510) : 630 - 641
  • [3] Sure independence screening for ultrahigh dimensional feature space
    Fan, Jianqing
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 849 - 883
  • [4] Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models
    Fan, Jianqing
    Ma, Yunbei
    Dai, Wei
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) : 1270 - 1284
  • [5] SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY
    Fan, Jianqing
    Song, Rui
    [J]. ANNALS OF STATISTICS, 2010, 38 (06) : 3567 - 3604
  • [6] Variable selection via nonconcave penalized likelihood and its oracle properties
    Fan, JQ
    Li, RZ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1348 - 1360
  • [7] Model selection with nonignorable nonresponse
    Fang, Fang
    Shao, Jun
    [J]. BIOMETRIKA, 2016, 103 (04) : 861 - 874
  • [8] Garcia RI, 2010, STAT SINICA, V20, P149
  • [9] Feature Screening for Ultrahigh Dimensional Categorical Data With Applications
    Huang, Danyang
    Li, Runze
    Wang, Hansheng
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2014, 32 (02) : 237 - 244
  • [10] Model Selection Criteria for Missing-Data Problems Using the EM Algorithm
    Ibrahim, Joseph G.
    Zhu, Hongtu
    Tang, Niansheng
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (484) : 1648 - 1658