Model-Based Prediction for Small Domains Using Covariates: A Comparison of Four Methods

被引:0
|
作者
Michal, Victoire [1 ]
Wakefield, Jon [2 ]
Schmidt, Alexandra M. [1 ]
Cavanaugh, Alicia [3 ]
Robinson, Brian E. [3 ]
Baumgartner, Jill [1 ,4 ]
机构
[1] McGill Univ, Dept Epidemiol Biostat & Occupat Hlth, Montreal, PQ, Canada
[2] Univ Washington, Dept Biostat, Seattle, WA USA
[3] McGill Univ, Dept Geog, Montreal, PQ, Canada
[4] McGill Univ, Inst Hlth & Social Policy, Montreal, PQ, Canada
关键词
High-dimensional auxiliary information; Model selection; Prediction intervals; Random forests; Split conformal inference; CONFIDENCE-INTERVALS; SELECTION;
D O I
10.1093/jssam/smae032
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
We consider methods for model-based small area estimation when the number of areas with sampled data is a small fraction of the total areas for which estimates are required. Abundant auxiliary information is available from the survey for all the sampled areas. Further, through an external source, there is information for all areas. The goal is to use auxiliary variables to predict the outcome of interest for all areas. We compare areal-level random forests and LASSO approaches to a frequentist forward variable selection approach and a Bayesian shrinkage method using a horseshoe prior. Further, to measure the uncertainty of estimates obtained from random forests and the LASSO, we propose a modification of the split conformal procedure that relaxes the assumption of exchangeable data. We show that the proposed method yields intervals with the correct coverage rate and this is confirmed through a simulation study. This work is motivated by Ghanaian data available from the sixth Ghana Living Standards Survey (GLSS) and the 2010 Population and Housing Census, in the Greater Accra Metropolitan Area (GAMA) region, which comprises eight districts that are further divided into enumeration areas (EAs). We estimate the areal mean household log consumption using both datasets. The outcome variable is measured only in the GLSS for 3 percent of all the EAs (136 out of 5019) and 174 potential covariates are available in both datasets. In the application, among the four modeling methods considered, the Bayesian shrinkage performed the best in terms of bias, mean squared error (MSE), and prediction interval coverages and scores, as assessed through a cross-validation study. We find substantial between-area variation with the estimated log consumption showing a 1.3-fold variation across the GAMA region. The western areas are the poorest while the Accra Metropolitan Area district has the richest areas.
引用
收藏
页码:1489 / 1514
页数:26
相关论文
共 50 条
  • [1] Model-based regression adjustment with model-free covariates for network interference
    Han, Kevin
    Ugander, Johan
    JOURNAL OF CAUSAL INFERENCE, 2023, 11 (01)
  • [2] Model-based vs. agnostic methods for the prediction of time-varying covariance matrices
    Fermanian, Jean-David
    Poignard, Benjamin
    Xidonas, Panos
    ANNALS OF OPERATIONS RESEARCH, 2025, 346 (01) : 511 - 548
  • [3] Comparison of four statistical and machine learning methods for crash severity prediction
    Iranitalab, Amirfarrokh
    Khattak, Aemal
    ACCIDENT ANALYSIS AND PREVENTION, 2017, 108 : 27 - 36
  • [4] Model-based small area estimation under informative sampling
    Verret, Francois
    Rao, J. N. K.
    Hidiroglou, Michael A.
    SURVEY METHODOLOGY, 2015, 41 (02) : 333 - 347
  • [5] A Comparison of Random Forest-Based Missing Imputation Methods for Covariates in Propensity Score Analysis
    Lee, Yongseok
    Leite, Walter L.
    PSYCHOLOGICAL METHODS, 2024,
  • [6] Extension of a haplotype-based genomic prediction model to manage multi-environment wheat data using environmental covariates
    He, Sang
    Thistlethwaite, Rebecca
    Forrest, Kerrie
    Shi, Fan
    Hayden, Matthew J.
    Trethowan, Richard
    Daetwyler, Hans D.
    THEORETICAL AND APPLIED GENETICS, 2019, 132 (11) : 3143 - 3154
  • [7] MBE: model-based enrichment estimation and prediction for differential sequencing data
    Busia, Akosua
    Listgarten, Jennifer
    GENOME BIOLOGY, 2023, 24 (01)
  • [8] A Repairing Artificial Neural Network Model-Based Stock Price Prediction
    Prabin, S. M.
    Thanabal, M. S.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 1337 - 1355
  • [9] Model-based methods to identify multiple cluster structures in a data set
    Galimberti, Giuliano
    Soffritti, Gabriele
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 520 - 536
  • [10] Learning From Missing Feedback: Exemplar Versus Model-Based Methods
    Denrell, Jerker
    Sanborn, Adam N.
    Spicer, Jake
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2024,