On Building Better Mousetraps and Understanding the Human Condition: Reflections on Big Data in the Social Sciences

被引:14
作者
Lin, Jimmy [1 ,2 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Adv Comp Studies UMIACS, College Pk, MD USA
基金
美国国家科学基金会;
关键词
big data; computational social science; machine learning; data mining; log analysis; CHOCOLATE CONSUMPTION; NOBEL;
D O I
10.1177/0002716215569174
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Over the past few years, we have seen the emergence of big data: disruptive technologies that have transformed commerce, science, and many aspects of society. Despite the tremendous enthusiasm for big data, there is no shortage of detractors. This article argues that many criticisms stem from a fundamental confusion over goals: whether the desired outcome of big data use is better science or better engineering. Critics point to the rejection of traditional data collection and analysis methods, confusion between correlation and causation, and an indifference to models with explanatory power. From the perspective of advancing social science, these are valid reservations. I contend, however, that if the end goal of big data use is to engineer computational artifacts that are more effective according to well-defined metrics, then whatever improves those metrics should be exploited without prejudice. Sound scientific reasoning, while helpful, is not necessary to improve engineering. Understanding the distinction between science and engineering resolves many of the apparent controversies surrounding big data and helps to clarify the criteria by which contributions should be assessed.
引用
收藏
页码:33 / 47
页数:15
相关论文
共 28 条
[1]   Statistical mechanics of complex networks [J].
Albert, R ;
Barabási, AL .
REVIEWS OF MODERN PHYSICS, 2002, 74 (01) :47-97
[2]  
[Anonymous], 2013, WTF: The Who to Follow Service at Twitter, DOI DOI 10.1145/2488388.2488433
[3]  
Backstrom L, 2012, PROCEEDINGS OF THE 3RD ANNUAL ACM WEB SCIENCE CONFERENCE, 2012, P33
[4]   Scaling to very very large corpora for natural language disambiguation [J].
Banko, M ;
Brill, E .
39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, :26-33
[5]  
Brants Thorsten, 2007, JOINT C EMNLP CONLL, P858
[6]  
danah boyd, 2011, OXF INT I DEC INT TI
[7]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[8]  
Eduardo Ruiz, 2014, P 8 INT AAAI C WEBL
[9]   Detecting influenza epidemics using search engine query data [J].
Ginsberg, Jeremy ;
Mohebbi, Matthew H. ;
Patel, Rajan S. ;
Brammer, Lynnette ;
Smolinski, Mark S. ;
Brilliant, Larry .
NATURE, 2009, 457 (7232) :1012-U4
[10]  
Granovetter M. S., 1977, American journal of sociology, P347, DOI DOI 10.1086/225469