Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

被引:130
作者
Benoit, Kenneth [1 ,2 ]
Conway, Drew [3 ]
Lauderdale, Benjamin E. [4 ]
Laver, Michael [3 ]
Mikhaylov, Slava [5 ]
机构
[1] London Sch Econ, London, England
[2] Trinity Coll Dublin, Dublin, Ireland
[3] NYU, New York, NY 10003 USA
[4] London Sch Econ & Polit Sci, London, England
[5] UCL, London WC1E 6BT, England
基金
欧洲研究理事会;
关键词
PARTY; RELIABILITY;
D O I
10.1017/S0003055416000058
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.
引用
收藏
页码:278 / 295
页数:18
相关论文
共 61 条
  • [31] Reliability and validity of the 2002 and 2006 Chapel Hill expert surveys on party positioning
    Hooghe, Liesbet
    Bakker, Ryan
    Brigevich, Anna
    De Vries, Catherine
    Edwards, Erica
    Marks, Gary
    Rovny, Jan
    Steenbergen, Marco
    Vachudova, Milada
    [J]. EUROPEAN JOURNAL OF POLITICAL RESEARCH, 2010, 49 (05) : 687 - 703
  • [32] The online laboratory: conducting experiments in a real labor market
    Horton, John J.
    Rand, David G.
    Zeckhauser, Richard J.
    [J]. EXPERIMENTAL ECONOMICS, 2011, 14 (03) : 399 - 425
  • [33] Hsueh PY, 2009, Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, P27
  • [34] Repeated labeling using multiple noisy labelers
    Ipeirotis, Panagiotis G.
    Provost, Foster
    Sheng, Victor S.
    Wang, Jing
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (02) : 402 - 441
  • [35] Jones Frank R., 2013, POLICY AGENDAS PROJE
  • [36] Kapelner A., 2010, WORLDS 1 C FUT DISTR
  • [37] KING G, 1995, PS POLITICAL SCI POL, V0028
  • [38] Klingemann Hans-Dieter., 1994, PARTIES POLICIES DEM
  • [39] Klingemann Hans-DieterAndrea Volkens., 2006, MAPPING POLICY PREFE
  • [40] Krippendorff K. H, 2013, Content Analysis: An Introduction to Its Methodology, V3rd, DOI DOI 10.1007/S13398-014-0173-7.2