Sampling Techniques for Big Data Analysis

被引:45
作者
Kim, Jae Kwang [1 ]
Wang, Zhonglei [2 ]
机构
[1] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
[2] Xiamen Univ, Sch Econ, Wang Yanan Inst Studies Econ WISE, Xiamen 361005, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Data integration; inverse sampling; non-probability sample; selection bias; VARIANCE-ESTIMATION; MISSING DATA; INFERENCE; NONRESPONSE; IMPUTATION;
D O I
10.1111/insr.12290
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In analysing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first method uses a version of inverse sampling by incorporating auxiliary information from external sources, and the second one borrows the idea of data integration by combining the big data sample with an independent probability sample. Two simulation studies show that the proposed methods are unbiased and have better coverage rates than their alternatives. In addition, the proposed methods are easy to implement in practice.
引用
收藏
页码:S177 / S191
页数:15
相关论文
共 41 条
[1]  
[Anonymous], 2016, COMPUTER AGE STAT IN
[2]  
[Anonymous], 2015, Wiley series in survey methodology
[3]   Solving the Nonresponse Problem With Sample Matching? [J].
Bethlehem, Jelke .
SOCIAL SCIENCE COMPUTER REVIEW, 2016, 34 (01) :59-77
[4]  
Breidt F.J., 1993, Sankhya: The Indian Journal of Statistics, Series B, P297
[5]   Using calibration weighting to adjust for nonresponse under a plausible model [J].
Chang, Ted ;
Kott, Phillip S. .
BIOMETRIKA, 2008, 95 (03) :555-571
[6]   Multiply robust imputation procedures for the treatment of item nonresponse in surveys [J].
Chen, Sixia ;
Haziza, David .
BIOMETRIKA, 2017, 104 (02) :439-453
[7]   TWO-PHASE SAMPLING EXPERIMENT FOR PROPENSITY SCORE ESTIMATION IN SELF-SELECTED SAMPLES [J].
Chen, Sixia ;
Kim, Jae-Kwang .
ANNALS OF APPLIED STATISTICS, 2014, 8 (03) :1492-1515
[8]  
Cochran WG., 1963, Sampling techniques
[9]   Inference for Nonprobability Samples [J].
Elliott, Michael R. ;
Valliant, Richard .
STATISTICAL SCIENCE, 2017, 32 (02) :249-264
[10]   Statistical Inference, Learning and Models in Big Data [J].
Franke, Beate ;
Plante, Jean-Francois ;
Roscher, Ribana ;
Lee, En-Shiun Annie ;
Smyth, Cathal ;
Hatefi, Armin ;
Chen, Fuqi ;
Gil, Einat ;
Schwing, Alexander ;
Selvitella, Alessandro ;
Hoffman, Michael M. ;
Grosse, Roger ;
Hendricks, Dieter ;
Reid, Nancy .
INTERNATIONAL STATISTICAL REVIEW, 2016, 84 (03) :371-389