Ensuring survey research data integrity in the era of internet bots

被引:2
作者
Griffin M. [1 ,2 ]
Martino R.J. [2 ]
LoSchiavo C. [1 ,2 ]
Comer-Carruthers C. [1 ,2 ]
Krause K.D. [1 ,2 ]
Stults C.B. [2 ,3 ]
Halkitis P.N. [2 ,4 ,5 ,6 ,7 ]
机构
[1] Department of Health Behavior, Society and Policy, Rutgers School of Public Health, Rutgers University, 683 Hoes Lane West, Piscataway, 08854, NJ
[2] Center for Health, Identity, Behavior and Prevention Studies, Rutgers University, Piscataway, NJ
[3] Psychology Department, Baruch College, City University of New York, New York, NY
[4] Department of Biostatistics and Social and Behavioral Health Sciences, Rutgers School of Public Health, Rutgers University, Piscataway, NJ
[5] Rutgers Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ
[6] Graduate School of Applied and Professional Psychology, Rutgers University, Piscataway, NJ
[7] School of Public Affairs and Administration, Rutgers University, Piscataway, NJ
关键词
Internet-based research; LGBTQ research; Survey research;
D O I
10.1007/s11135-021-01252-1
中图分类号
学科分类号
摘要
We used an internet-based survey platform to conduct a cross-sectional survey regarding the impact of COVID-19 on the LGBTQ + population in the United States. While this method of data collection was quick and inexpensive, the data collected required extensive cleaning due to the infiltration of bots. Based on this experience, we provide recommendations for ensuring data integrity. Recruitment conducted between May 7 and 8, 2020 resulted in an initial sample of 1251 responses. The Qualtrics survey was disseminated via social media and professional association listservs. After noticing data discrepancies, research staff developed a rigorous data cleaning protocol. A second wave of recruitment was conducted on June 11–12, 2020 using the original recruitment methods. The five-step data cleaning protocol led to the removal of 773 (61.8%) surveys from the initial dataset, resulting in a sample of 478 participants in the first wave of data collection. The protocol led to the removal of 46 (31.9%) surveys from the second two-day wave of data collection, resulting in a sample of 98 participants in the second wave of data collection. After verifying the two-day pilot process was effective at screening for bots, the survey was reopened for a third wave of data collection resulting in a total of 709 responses, which were identified as an additional 514 (72.5%) valid participants and led to the removal of an additional 194 (27.4%) possible bots. The final analytic sample consists of 1090 participants. Although a useful and efficient research tool, especially among hard-to-reach populations, internet-based research is vulnerable to bots and mischievous responders, despite survey platforms’ built-in protections. Beyond the depletion of research funds, bot infiltration threatens data integrity and may disproportionately harm research with marginalized populations. Based on our experience, we recommend the use of strategies such as qualitative questions, duplicate demographic questions, and incentive raffles to reduce likelihood of mischievous respondents. These protections can be undertaken to ensure data integrity and facilitate research on vulnerable populations. © 2021, The Author(s), under exclusive licence to Springer Nature B.V.
引用
收藏
页码:2841 / 2852
页数:11
相关论文
共 34 条
[1]  
Augello A., Gentile M., Dignum F., An overview of open-source chatbots social skills, International Conference on Internet Science, pp. 236-248, (2017)
[2]  
Bond K.T., Yoon I.S., Houang S.T., Downing M.J., Grov C., Hirshfield S., Transactional sex, substance use, and sexual risk: comparing pay direction for an internet-based US sample of men who have sex with men, Sex. Res. Soc. Policy, 16, 3, pp. 255-267, (2019)
[3]  
Bowen A., Ball K., REPORT: Creating and Piloting a Survey to Determine Readiness in Rural Populations in Ohio, (2020)
[4]  
Buchanan E.M., Scofield J.E., Methods to detect low quality data and its implication for psychological research, Behav. Res. Methods, 50, 6, pp. 2586-2596, (2018)
[5]  
Campbell R.M., Venn T.J., Anderson N.M., Cost and performance tradeoffs between mail and internet survey modes in a nonmarket valuation study, J. Environ. Manag., 210, pp. 316-327, (2018)
[6]  
Cimpian J.R., Timmer J.D., Birkett M.A., Marro R.L., Turner B.C., Phillips G.L., Bias from potentially mischievous responders on large-scale estimates of lesbian, gay, bisexual, or questioning (LGBQ)-heterosexual youth health disparities, Am. J. Public Health, 108, S4, pp. S258-S265, (2018)
[7]  
Das M., Ester P., Kaczmirek L., Social and Behavioral Research and the Internet: Advances in Applied Methods and Research Strategies, (2018)
[8]  
Dennis S.A., Goodson B.M., Pearson C.A., Online worker fraud and evolving threats to the integrity of MTurk data: a discussion of virtual private servers and the limitations of IP-based screening procedures, Behav. Res. Account., 32, 1, pp. 119-134, (2020)
[9]  
Eslahi M., Salleh R., Anuar N.B., Bots and botnets: An overview of characteristics, detection and challenges, 2012 IEEE International Conference on Control System, Computing and Engineering, pp. 349-354, (2012)
[10]  
Godinho A., Schell C., Cunningham J.A., Out damn bot, out: recruiting real people into substance use studies on the internet, Subst. Abuse, 41, 1, pp. 3-5, (2020)