Statistical data integration in survey sampling: a review

被引:44
|
作者
Yang, Shu [1 ]
Kim, Jae Kwang [2 ]
机构
[1] North Carolina State Univ, Dept Stat, Raleigh, NC USA
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
Generalizability; Meta-analysis; Missing at random; Transportability; PROPENSITY SCORE; COMBINING INFORMATION; MULTIPLE SURVEYS; GENERALIZING EVIDENCE; ROBUST ESTIMATION; CAUSAL INFERENCE; MISSING DATA; PROBABILITY; CALIBRATION; IMPUTATION;
D O I
10.1007/s42081-020-00093-w
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration provides a timely solution by leveraging multiple data sources to provide more robust and efficient inference than using any single data source alone. The technique for data integration varies depending on types of samples and available information to be combined. This article provides a systematic review of data integration techniques for combining probability samples, probability and non-probability samples, and probability and big data samples. We discuss a wide range of integration methods such as generalized least squares, calibration weighting, inverse probability weighting, mass imputation, and doubly robust methods. Finally, we highlight important questions for future research.
引用
收藏
页码:625 / 650
页数:26
相关论文
共 50 条
  • [21] Sampling Techniques for Big Data Analysis
    Kim, Jae Kwang
    Wang, Zhonglei
    INTERNATIONAL STATISTICAL REVIEW, 2019, 87 : S177 - S191
  • [22] Batch effect removal methods for microarray gene expression data integration: a survey
    Lazar, Cosmin
    Meganck, Stijn
    Taminau, Jonatan
    Steenhoff, David
    Coletta, Alain
    Molter, Colin
    Weiss-Solis, David Y.
    Duque, Robin
    Bersini, Hugues
    Nowe, Ann
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (04) : 469 - 490
  • [23] Flood Hazard Assessment in Data-Scarce Watersheds Using Model Coupling, Event Sampling, and Survey Data
    Hurtado-Pidal, Jorge
    Acero Triana, Juan S.
    Espitia-Sarmiento, Edgar
    Jarrin-Perez, Fernando
    WATER, 2020, 12 (10)
  • [24] Combining Survey Data with Other Data Sources
    Lohr, Sharon L.
    Raghunathan, Trivellore E.
    STATISTICAL SCIENCE, 2017, 32 (02) : 293 - 312
  • [25] ROBUST BAYESIAN INFERENCE FOR BIG DATA: COMBINING SENSOR-BASED RECORDS WITH TRADITIONAL SURVEY DATA
    Rafei, Ali
    Flannagan, Carol A. C.
    West, Brady T.
    Elliott, Michael R.
    ANNALS OF APPLIED STATISTICS, 2022, 16 (02) : 1038 - 1070
  • [26] COMBINED INFERENCE IN SURVEY SAMPLING
    Sarndal, Carl-Erik
    PAKISTAN JOURNAL OF STATISTICS, 2011, 27 (04): : 359 - 370
  • [27] Calibrated propensity score method for survey nonresponse in cluster sampling
    Kim, Jae Kwang
    Kwon, Yongchan
    Paik, Myunghee Cho
    BIOMETRIKA, 2016, 103 (02) : 461 - 473
  • [28] Hansen Lecture 2022: The Evolution of the Use of Models in Survey Sampling
    Valliant, Richard
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2024, 12 (02) : 275 - 304
  • [29] Statistical analysis with missing exposure data measured by proxy respondents: a misclassification problem within a missing-data problem
    Shardell, Michelle
    Hicks, Gregory E.
    STATISTICS IN MEDICINE, 2014, 33 (25) : 4437 - 4452
  • [30] Random Subspace Sampling for Classification with Missing Data
    Cao, Yun-Hao
    Wu, Jian-Xin
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (02) : 472 - 486