Robust quasi-randomization-based estimation with ensemble learning for missing data

被引:1
|
作者
Lee, Danhyang [1 ]
Zhang, Li-Chun [2 ,3 ]
Chen, Sixia [4 ]
机构
[1] Univ Alabama, Dept Informat Syst Stat & Management Sci, Tuscaloosa, AL 35487 USA
[2] Univ Southampton, Dept Social Stat & Demog, Southampton, England
[3] Stat Sentralbyra, Oslo, Norway
[4] Univ Oklahoma, Dept Biostat & Epidemiol, Hlth Sci Ctr, Oklahoma City, OK USA
关键词
cell mean model; item nonresponse; missing at random; Rao-Blackwell method; variance estimation; IMPUTATION PROCEDURES; NONRESPONSE; INFERENCE;
D O I
10.1111/sjos.12626
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi-randomization-based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao-Blackwell method, given cell-homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.
引用
收藏
页码:1263 / 1278
页数:16
相关论文
共 25 条
  • [21] A new approach to estimation of the proportional hazards model based on interval-censored data with missing covariates
    Zhou, Ruiwen
    Li, Huiqiong
    Sun, Jianguo
    Tang, Niansheng
    LIFETIME DATA ANALYSIS, 2022, 28 (03) : 335 - 355
  • [22] Oracle-efficient estimation for the mean function of missing covariate data based on noparametrically estimated selection probabilities
    Cai, Li
    Yao, Yao
    Wang, Suojin
    JOURNAL OF NONPARAMETRIC STATISTICS, 2024, 36 (04) : 1018 - 1035
  • [23] AI-based ensemble modeling of landfill leakage employing a lysimeter, climatic data and transfer learning
    Baghanam, Aida H.
    Vakili, Amirreza Tabataba
    Nourani, Vahid
    Dabrowska, Dominika
    Soltysiak, Marek
    JOURNAL OF HYDROLOGY, 2022, 612
  • [24] Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data
    Mahbub, Sazan
    Sawmya, Shashata
    Saha, Arpita
    Reaz, Rezwana
    Rahman, M. Sohel
    Bayzid, Md. Shamsuzzoha
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (11) : 1156 - 1172
  • [25] The performance of coalescent-based species tree estimation methods under models of missing data (vol 19, 286, 2018)
    Nute, Michael
    Chou, Jed
    Molloy, Erin K.
    Warnow, Tandy
    BMC GENOMICS, 2020, 21 (01)