Big Data and the danger of being precisely inaccurate

被引:46
作者
McFarland, Daniel A. [1 ,2 ]
McFarland, H. Richard [1 ,2 ]
机构
[1] Stanford Univ, 520 Galvez Mall, Stanford, CA 94305 USA
[2] Hearst Corp, New York, NY USA
关键词
Big Data; bias; segmentation; sociology; statistics; inaccuracy;
D O I
10.1177/2053951715602495
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Social scientists and data analysts are increasingly making use of Big Data in their analyses. These data sets are often "found data'' arising from purely observational sources rather than data derived under strict rules of a statistically designed experiment. However, since these large data sets easily meet the sample size requirements of most statistical procedures, they give analysts a false sense of security as they proceed to focus on employing traditional statistical methods. We explain how most analyses performed on Big Data today lead to "precisely inaccurate'' results that hide biases in the data but are easily overlooked due to the enhanced significance of the results created by the data size. Before any analyses are performed on large data sets, we recommend employing a simple data segmentation technique to control for some major components of observational data biases. These segments will help to improve the accuracy of the results.
引用
收藏
页数:4
相关论文
共 12 条
[1]  
Birant D., 2011, DATA MINING USING RF
[2]  
Blattberg R.C., 2008, DATABASE MARKETING, V18, P323
[3]   Optimal selection for direct mail [J].
Bult, JR ;
Wansbeek, T .
MARKETING SCIENCE, 1995, 14 (04) :378-394
[4]   When Google got flu wrong [J].
Butler, Declan .
NATURE, 2013, 494 (7436) :155-156
[5]  
Derya B, 2011, KNOWLEDGE ORIENTED A
[6]   Detecting influenza epidemics using search engine query data [J].
Ginsberg, Jeremy ;
Mohebbi, Matthew H. ;
Patel, Rajan S. ;
Brammer, Lynnette ;
Smolinski, Mark S. ;
Brilliant, Larry .
NATURE, 2009, 457 (7232) :1012-U4
[7]   The Parable of Google Flu: Traps in Big Data Analysis [J].
Lazer, David ;
Kennedy, Ryan ;
King, Gary ;
Vespignani, Alessandro .
SCIENCE, 2014, 343 (6176) :1203-1205
[8]  
Leskovec J., 2010, PROC INT C WORLD WID, P631
[9]  
Lewis RA, 2011, WWW 2011
[10]  
Shih Y. Y., 2003, J DATABASE MARKET CU, V11, P159, DOI DOI 10.1057/PALGRAVE.DBM.3240216