Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting

被引:52
作者
Schroeders, Ulrich [1 ]
Schmidt, Christoph [2 ]
Gnambs, Timo [3 ]
机构
[1] Univ Kassel, Kassel, Germany
[2] Eoda GmbH, Kassel, Germany
[3] Leibniz Inst Educ Trajectories, Bamberg, Germany
关键词
careless responding; gradient boosted trees; data cleaning; response times; outlier detection; ITEM PREKNOWLEDGE; VALIDITY; MODELS; INDICATORS; RESPONSES;
D O I
10.1177/00131644211004708
中图分类号
G44 [教育心理学];
学科分类号
0402 ; 040202 ;
摘要
Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the-art machine learning technique, are introduced to identify careless respondents. The performance of the approach was compared with established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent versus careless response behavior was experimentally induced. In the simulation study, gradient boosting machines outperformed traditional detection mechanisms in flagging aberrant responses. However, this advantage did not transfer to the empirical study. In terms of precision, the results of both traditional and the novel detection mechanisms were unsatisfactory, although the latter incorporated response times as additional information. The comparison between the results of the simulation and the online study showed that responses in real-world settings seem to be much more erratic than can be expected from the simulation studies. We critically discuss the generalizability of currently available detection methods and provide an outlook on future research on the detection of aberrant response patterns in survey research.
引用
收藏
页码:29 / 56
页数:28
相关论文
共 82 条
[1]   A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data [J].
Arias, Victor B. ;
Garrido, L. E. ;
Jenaro, C. ;
Martinez-Molina, A. ;
Arias, B. .
BEHAVIOR RESEARCH METHODS, 2020, 52 (06) :2489-2505
[2]   The HEXACO-60: A Short Measure of the Major Dimensions of Personality [J].
Ashton, Michael C. ;
Lee, Kibeom .
JOURNAL OF PERSONALITY ASSESSMENT, 2009, 91 (04) :340-345
[3]  
Berk, 2017, STAT LEARNING REGRES
[4]  
Berry D. T., 1992, PSYCHOL ASSESSMENT, V4, P340, DOI DOI 10.1037/1040-3590.4.3.340
[5]   Will the Questions Ever End? Person-Level Increases in Careless Responding During Questionnaire Completion [J].
Bowling, Nathan A. ;
Gibson, Anthony M. ;
Houpt, Joseph W. ;
Brower, Cheyna K. .
ORGANIZATIONAL RESEARCH METHODS, 2021, 24 (04) :718-738
[6]   Your Attention Please! Toward a Better Understanding of Research Participant Carelessness [J].
Bowling, Nathan A. ;
Huang, Jason L. .
APPLIED PSYCHOLOGY-AN INTERNATIONAL REVIEW-PSYCHOLOGIE APPLIQUEE-REVUE INTERNATIONALE, 2018, 67 (02) :227-230
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Methods to detect low quality data and its implication for psychological research [J].
Buchanan, Erin M. ;
Scofield, John E. .
BEHAVIOR RESEARCH METHODS, 2018, 50 (06) :2586-2596
[9]   Recommendations and future directions for supervised machine learning in psychiatry [J].
Cearns, Micah ;
Hahn, Tim ;
Baune, Bernhard T. .
TRANSLATIONAL PSYCHIATRY, 2019, 9 (1)
[10]   Technology trends in survey data collection [J].
Couper, MP .
SOCIAL SCIENCE COMPUTER REVIEW, 2005, 23 (04) :486-501