SCORE TESTS WITH INCOMPLETE COVARIATES AND HIGH-DIMENSIONAL AUXILIARY VARIABLES

被引:0
作者
Wong, Kin Yau [1 ,2 ]
Feng, Jiahui [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Appl Math, Hung Hom, Kowloon, Hong Kong, Peoples R China
[2] Hong Kong Polytech Univ, Dept Appl Math, Kowloon, Hong Kong, Peoples R China
关键词
Association test; integrative analysis; missing data; post-selection inference; variable selection; POST-SELECTION INFERENCE; CONFIDENCE-INTERVALS; SAMPLING DESIGNS; REGIONS; MODELS;
D O I
10.5705/ss.202021.0253
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Analyses of modern biomedical data are often complicated by missing values. When variables of interest are missing for some subjects, it is desirable to use observed auxiliary variables, which are sometimes high dimensional, to impute or predict the missing values in order to improve the statistical efficiency. Although many methods have been developed for prediction using high-dimensional variables, it is challenging to perform a valid inference based on such predicted values. In this study, we develop an association test for an outcome variable and a potentially missing covariate, where the covariate can be predicted using variables selected from a set of high-dimensional auxiliary variables. We establish the validity of the test under data-driven model-selection procedures. We also demonstrate the validity of the proposed method and its advantages over existing methods using extensive simulation studies and an application to a major cancer genomics study.
引用
收藏
页码:1483 / 1505
页数:23
相关论文
共 25 条
  • [1] [Anonymous], 2002, A user's guide to measure theoretic probability
  • [2] VALID CONFIDENCE INTERVALS FOR POST-MODEL-SELECTION PREDICTORS
    Bachoc, Francois
    Leeb, Hannes
    Potscher, Benedikt M.
    [J]. ANNALS OF STATISTICS, 2019, 47 (03) : 1475 - 1504
  • [3] VALID POST-SELECTION INFERENCE
    Berk, Richard
    Brown, Lawrence
    Buja, Andreas
    Zhang, Kai
    Zhao, Linda
    [J]. ANNALS OF STATISTICS, 2013, 41 (02) : 802 - 837
  • [4] Powerful extreme phenotype sampling designs and score tests for genetic association studies
    Bjornland, Thea
    Bye, Anja
    Ryeng, Einar
    Wisloff, Ulrik
    Langaas, Mette
    [J]. STATISTICS IN MEDICINE, 2018, 37 (28) : 4234 - 4251
  • [5] Chung K. L, 2001, COURSE PROBABILITY T
  • [6] Score tests for association under response-dependent sampling designs for expensive covariates
    Derkach, Andriy
    Lawless, Jerald F.
    Sun, Lei
    [J]. BIOMETRIKA, 2015, 102 (04) : 988 - 994
  • [7] Sure independence screening for ultrahigh dimensional feature space
    Fan, Jianqing
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 849 - 883
  • [8] SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY
    Fan, Jianqing
    Song, Rui
    [J]. ANNALS OF STATISTICS, 2010, 38 (06) : 3567 - 3604
  • [9] Fithian W, 2017, Arxiv, DOI arXiv:1410.2597
  • [10] Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data
    Heller, Ruth
    Chatterjee, Nilanjan
    Krieger, Abba
    Shi, Jianxin
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (524) : 1770 - 1783