Assessing and Improving Data Integrity in Web-Based Surveys: Comparison of Fraud Detection Systems in a COVID-19 Study

被引:15
作者
Bonett, Stephen [1 ,3 ]
Lin, Willey [1 ]
Topper, Patrina Sexton [1 ]
Wolfe, James [1 ]
Golinkoff, Jesse [1 ]
Deshpande, Aayushi [2 ]
Villarruel, Antonia [1 ]
Bauermeister, Jose [1 ]
机构
[1] Univ Penn, Sch Nursing, Philadelphia, PA USA
[2] Ashoka Univ, Dept Psychol, Sonepat, India
[3] Univ Penn, Sch Nursing, 418 Curie Blvd, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
web-based survey; data quality; fraud; survey methodology; COVID-19; survey; fraud detection; Philadelphia; data privacy; data protection; privacy; security; data; information security; data validation; cross-sectional; web-based; ONLINE;
D O I
10.2196/47091
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Web-based surveys increase access to study participation and improve opportunities to reach diverse populations. However, web-based surveys are vulnerable to data quality threats, including fraudulent entries from automated bots and duplicative submissions. Widely used proprietary tools to identify fraud offer little transparency about the methods used, effectiveness, or representativeness of resulting data sets. Robust, reproducible, and context-specific methods of accurately detecting fraudulent responses are needed to ensure integrity and maximize the value of web-based survey research. Objective: This study aims to describe a multilayered fraud detection system implemented in a large web-based survey about COVID-19 attitudes, beliefs, and behaviors; examine the agreement between this fraud detection system and a proprietary fraud detection system; and compare the resulting study samples from each of the 2 fraud detection methods. Methods: The PhillyCEAL Common Survey is a cross-sectional web-based survey that remotely enrolled residents ages 13 years and older to assess how the COVID-19 pandemic impacted individuals, neighborhoods, and communities in Philadelphia, Pennsylvania. Two fraud detection methods are described and compared: (1) a multilayer fraud detection strategy developed by the research team that combined automated validation of response data and real-time verification of study entries by study personnel and (2) the proprietary fraud detection system used by the Qualtrics (Qualtrics) survey platform. Descriptive statistics were computed for the full sample and for responses classified as valid by 2 different fraud detection methods, and classification tables were created to assess agreement between the methods. The impact of fraud detection methods on the distribution of vaccine confidence by racial or ethnic group was assessed. Results: Of 7950 completed surveys, our multilayer fraud detection system identified 3228 (40.60%) cases as valid, while the Qualtrics fraud detection system identified 4389 (55.21%) cases as valid. The 2 methods showed only "fair" or "minimal" agreement in their classifications (kappa=0.25; 95% CI 0.23-0.27). The choice of fraud detection method impacted the distribution of vaccine confidence by racial or ethnic group. Conclusions: The selection of a fraud detection method can affect the study's sample composition. The findings of this study, while not conclusive, suggest that a multilayered approach to fraud detection that includes conservative use of automated fraud detection and integration of human review of entries tailored to the study's specific context and its participants may be warranted for future survey research.
引用
收藏
页数:14
相关论文
共 47 条
[1]   Social media as a recruitment platform for a nationwide online survey of COVID-19 knowledge, beliefs, and practices in the United States: methodology and feasibility analysis [J].
Ali, Shahmir H. ;
Foreman, Joshua ;
Capasso, Ariadna ;
Jones, Abbey M. ;
Tozan, Yesim ;
DiClemente, Ralph J. .
BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
[2]  
[Anonymous], 2022, Proxy and VPN detection API
[3]  
[Anonymous], 2022, reCAPTCHA v3
[4]  
[Anonymous], 2023, reCAPTCHA v2
[5]  
[Anonymous], 2020, Amazon Mechanical Turk
[6]  
[Anonymous], 2020, RelevantID®: enjoy a next-generation approach to ID validation
[7]   Web Runner 2049: Evaluating Third-Party Anti-bot Services [J].
Azad, Babak Amin ;
Starov, Oleksii ;
Laperdrix, Pierre ;
Nikiforakis, Nick .
DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT, DIMVA 2020, 2020, 12223 :135-159
[8]   Fraud Detection Protocol for Web-Based Research Among Men Who Have Sex With Men: Development and Descriptive Evaluation [J].
Ballard, April M. ;
Cardwell, Trey ;
Young, April M. .
JMIR PUBLIC HEALTH AND SURVEILLANCE, 2019, 5 (01) :80-89
[9]   Data Quality in HIV/AIDS Web-Based Surveys: Handling Invalid and Suspicious Data [J].
Bauermeister, Jose A. ;
Pingel, Emily ;
Zimmerman, Marc ;
Couper, Mick ;
Carballo-Dieguez, Alex ;
Strecher, Victor J. .
FIELD METHODS, 2012, 24 (03) :272-291
[10]   Innovative Recruitment Using Online Networks: Lessons Learned From an Online Study of Alcohol and Other Drug Use Utilizing a Web-Based, Respondent-Driven Sampling (webRDS) Strategy [J].
Bauermeister, Jose A. ;
Zimmerman, Marc A. ;
Johns, Michelle M. ;
Glowacki, Pietreck ;
Stoddard, Sarah ;
Volz, Erik .
JOURNAL OF STUDIES ON ALCOHOL AND DRUGS, 2012, 73 (05) :834-838