Statistical data integration in survey sampling: a review

被引:44
|
作者
Yang, Shu [1 ]
Kim, Jae Kwang [2 ]
机构
[1] North Carolina State Univ, Dept Stat, Raleigh, NC USA
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
Generalizability; Meta-analysis; Missing at random; Transportability; PROPENSITY SCORE; COMBINING INFORMATION; MULTIPLE SURVEYS; GENERALIZING EVIDENCE; ROBUST ESTIMATION; CAUSAL INFERENCE; MISSING DATA; PROBABILITY; CALIBRATION; IMPUTATION;
D O I
10.1007/s42081-020-00093-w
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration provides a timely solution by leveraging multiple data sources to provide more robust and efficient inference than using any single data source alone. The technique for data integration varies depending on types of samples and available information to be combined. This article provides a systematic review of data integration techniques for combining probability samples, probability and non-probability samples, and probability and big data samples. We discuss a wide range of integration methods such as generalized least squares, calibration weighting, inverse probability weighting, mass imputation, and doubly robust methods. Finally, we highlight important questions for future research.
引用
收藏
页码:625 / 650
页数:26
相关论文
共 50 条
  • [41] Three controversies in the history of survey sampling
    Brewer, Ken
    SURVEY METHODOLOGY, 2013, 39 (02) : 249 - 262
  • [42] Handling survey nonresponse in cluster sampling
    Shao, Jun
    SURVEY METHODOLOGY, 2007, 33 (01) : 81 - 85
  • [43] Statistical analysis and handling of missing data in cluster randomized trials: a systematic review
    Fiero, Mallorie H.
    Huang, Shuang
    Oren, Eyal
    Bell, Melanie L.
    TRIALS, 2016, 17
  • [44] Statistical analysis and handling of missing data in cluster randomized trials: a systematic review
    Mallorie H. Fiero
    Shuang Huang
    Eyal Oren
    Melanie L. Bell
    Trials, 17
  • [45] GENERALIZED RAKING PROCEDURES IN SURVEY SAMPLING
    DEVILLE, JC
    SARNDAL, CE
    SAUTORY, O
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (423) : 1013 - 1020
  • [46] A survey on missing data in machine learning
    Emmanuel, Tlamelo
    Maupong, Thabiso
    Mpoeleng, Dimane
    Semong, Thabo
    Mphago, Banyatsang
    Tabona, Oteng
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [47] Potentially missing data are considerably more frequent than definitely missing data: a methodological survey of 638 randomized controlled trials
    Kahale, Lara A.
    Diab, Batoul
    Khamis, Assem M.
    Chang, Yaping
    Lopes, Luciane Cruz
    Agarwal, Arnav
    Li, Ling
    Mustafa, Reem A.
    Koujanian, Serge
    Waziry, Reem
    Busse, Jason W.
    Dakik, Abeer
    Guyatt, Gordon
    Akl, Elie A.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 106 : 18 - 31
  • [48] Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy
    Matthews, Gregory J.
    Harel, Ofer
    STATISTICS SURVEYS, 2011, 5 : 1 - 29
  • [49] A survey on missing data in machine learning
    Tlamelo Emmanuel
    Thabiso Maupong
    Dimane Mpoeleng
    Thabo Semong
    Banyatsang Mphago
    Oteng Tabona
    Journal of Big Data, 8
  • [50] Generalizing randomized trial findings to a target population using complex survey population data
    Ackerman, Benjamin
    Lesko, Catherine R.
    Siddique, Juned
    Susukida, Ryoko
    Stuart, Elizabeth A.
    STATISTICS IN MEDICINE, 2021, 40 (05) : 1101 - 1120