Realistic precision and accuracy of online experiment platforms, web browsers, and devices

被引:194
作者
Anwyl-Irvine, Alexander [1 ,2 ]
Dalmaijer, Edwin S. [1 ]
Hodges, Nick [2 ]
Evershed, Jo K. [2 ]
机构
[1] Univ Cambridge, MRC Cognit & Brain Sci Unit, Cambridge, England
[2] St Johns Innovat Ctr, Cauldron Sci, Cambridge, England
基金
英国医学研究理事会;
关键词
Accuracy; Experiment builder; Big data; Reaction time; MTurk; Online testing; System testing; Automated hardware testing; Psychophysics; TIMING ACCURACY; ADOBE FLASH; !text type='JAVA']JAVA[!/text]SCRIPT; FAILURE;
D O I
10.3758/s13428-020-01501-5
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
Due to increasing ease of use and ability to quickly collect large samples, online behavioural research is currently booming. With this popularity, it is important that researchers are aware of who online participants are, and what devices and software they use to access experiments. While it is somewhat obvious that these factors can impact data quality, the magnitude of the problem remains unclear. To understand how these characteristics impact experiment presentation and data quality, we performed a battery of automated tests on a number of realistic set-ups. We investigated how different web-building platforms (Gorilla v.20190828, jsPsych v6.0.5, Lab.js v19.1.0, and psychoJS/PsychoPy3 v3.1.5), browsers (Chrome, Edge, Firefox, and Safari), and operating systems (macOS and Windows 10) impact display time across 30 different frame durations for each software combination. We then employed a robot actuator in realistic set-ups to measure response recording across the aforementioned platforms, and between different keyboard types (desktop and integrated laptop). Finally, we analysed data from over 200,000 participants on their demographics, technology, and software to provide context to our findings. We found that modern web platforms provide reasonable accuracy and precision for display duration and manual response time, and that no single platform stands out as the best in all features and conditions. In addition, our online participant analysis shows what equipment they are likely to use.
引用
收藏
页码:1407 / 1425
页数:19
相关论文
共 36 条
[1]   Gorilla in our midst: An online behavioral experiment builder [J].
Anwyl-Irvine, Alexander L. ;
Massonnie, Jessica ;
Flitton, Adam ;
Kirkham, Natasha ;
Evershed, Jo K. .
BEHAVIOR RESEARCH METHODS, 2020, 52 (01) :388-407
[2]  
Baker J.D., 2013, Online instruments, data collection, and electronic measurements: Organizational advancements, P328, DOI [DOI 10.4018/978-1-4666-2172-5.CH019, 10.4018/978-1-4666-2172-5.ch019]
[3]   QRTEngine: An easy solution for running online reaction time experiments using Qualtrics [J].
Barnhoorn, Jonathan S. ;
Haasnoot, Erwin ;
Bocanegra, Bruno R. ;
van Steenbergen, Henk .
BEHAVIOR RESEARCH METHODS, 2015, 47 (04) :918-929
[4]   SIZE INVARIANCE IN VISUAL OBJECT PRIMING [J].
BIEDERMAN, I ;
COOPER, EE .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1992, 18 (01) :121-133
[5]   Web-based experiments controlled by Java']JavaScript: An example from probability learning [J].
Birnbaum, MH ;
Wakcher, SV .
BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 2002, 34 (02) :189-199
[6]  
Birnbaum MichaelH., 2000, Psychological experiments on the internet
[7]   Mechanical Turk upends social sciences [J].
Bohannon, John .
SCIENCE, 2016, 352 (6291) :1263-1264
[8]   The timing mega-study: comparing a range of experiment generators, both lab-based and online [J].
Bridges, David ;
Pitiot, Alain ;
MacAskill, Michael R. ;
Peirce, Jonathan W. .
PEERJ, 2020, 8
[9]   Power failure: why small sample size undermines the reliability of neuroscience [J].
Button, Katherine S. ;
Ioannidis, John P. A. ;
Mokrysz, Claire ;
Nosek, Brian A. ;
Flint, Jonathan ;
Robinson, Emma S. J. ;
Munafo, Marcus R. .
NATURE REVIEWS NEUROSCIENCE, 2013, 14 (05) :365-376
[10]  
Clifford S., 2014, Journal of Experimental Political Science, V1, P120, DOI [DOI 10.1017/XPS.2014.5, https://doi.org/10.1017/xps.2014.5]