Can nonprobability samples be used for social science research? A cautionary tale

被引:84
作者
Zack, Elizabeth S. [1 ]
Kennedy, John M. [1 ]
Long, J. Scott [1 ]
机构
[1] Indiana Univ, Bloomington, IN 47405 USA
关键词
Nonprobability Samples; Online Panels; MTurk; GSS; MECHANICAL TURK; PROBIT COEFFICIENTS; LOGIT;
D O I
10.18148/srm/2019.v13i2.7262
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Survey researchers and social scientists are trying to understand the appropriate use of nonprobability samples as substitutes for probability samples in social science research. While cognizant of the challenges presented by nonprobability samples, scholars increasingly rely on these samples due to their low cost and speed of data collection. This paper contributes to the growing literature on the appropriate use of nonprobability samples by comparing two online non-probability samples, Amazon's Mechanical Turk (MTurk) and a Qualtrics Panel, with a gold standard nationally representative probability sample, the General Social Survey (GSS). Most research in this area focuses on determining the best techniques to improve point estimates from nonprobability samples, often using gold standard surveys or census data to determine the accuracy of the point estimates. This paper differs from that line of research in that we examine how probability and nonprobability samples differ when used in multivariate analysis, the research technique used by many social scientists. Additionally, we examine whether restricting each sample to a population well-represented in MTurk (Americans age 45 and under) improves MTurk's estimates. We find that, while Qualtrics and MTurk differ somewhat from the GSS, Qualtrics outperforms MTurk in both univariate and multivariate analysis. Further, restricting the samples substantially improves MTurk's estimates, but not enough to close the gap with Qualtrics. With both Qualtrics and MTurk, we find a risk of false positives. Our findings suggest that these online nonprobability samples may sometimes be "fit for purpose," but should be used with caution.
引用
收藏
页码:215 / 227
页数:13
相关论文
共 30 条
[1]   Comparing logit and probit coefficients across groups [J].
Allison, PD .
SOCIOLOGICAL METHODS & RESEARCH, 1999, 28 (02) :186-208
[2]  
[Anonymous], STAT 15
[3]  
[Anonymous], 2017, PEW RES CTR
[4]   Turking overtime: how participant characteristics and behavior vary over time and day on Amazon Mechanical Turk [J].
Antonio A. Arechar ;
Gordon T. Kraft-Todd ;
David G. Rand .
Journal of the Economic Science Association, 2017, 3 (1) :1-11
[5]   The viability of crowdsourcing for survey research [J].
Behrend, Tara S. ;
Sharek, David J. ;
Meade, Adam W. ;
Wiebe, Eric N. .
BEHAVIOR RESEARCH METHODS, 2011, 43 (03) :800-813
[6]   Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk [J].
Berinsky, Adam J. ;
Huber, Gregory A. ;
Lenz, Gabriel S. .
POLITICAL ANALYSIS, 2012, 20 (03) :351-368
[7]   Total Survey Error: Design, Implementation, and Evaluation [J].
Biemer, Paul P. .
PUBLIC OPINION QUARTERLY, 2010, 74 (05) :817-848
[8]   Correlations and Nonlinear Probability Models [J].
Breen, Richard ;
Holm, Anders ;
Karlson, Kristian Bernt .
SOCIOLOGICAL METHODS & RESEARCH, 2014, 43 (04) :571-605
[9]   Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data? [J].
Buhrmester, Michael ;
Kwang, Tracy ;
Gosling, Samuel D. .
PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2011, 6 (01) :3-5
[10]  
Callegaro M, 2014, WILEY SER SURV METH, P1, DOI 10.1002/9781118763520