How Robust Are Multirater Interrater Reliability Indices to Changes in Frequency Distribution?

被引:74
作者
Quarfoot, David [1 ]
Levine, Richard A. [2 ]
机构
[1] San Diego State Univ, Ctr Res Math & Sci Educ, San Diego, CA 92182 USA
[2] San Diego State Univ, Dept Stat, San Diego, CA 92182 USA
关键词
Agreement; Conger; Fleiss; Gwet; Krippendorff; Paradox; HIGH AGREEMENT; LOW KAPPA;
D O I
10.1080/00031305.2016.1141708
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Interrater reliability studies are used in a diverse set of fields. Often, these investigations involve three or more raters, and thus, require the use of indices such as Fleiss's kappa, Conger's kappa, or Krippendorff's alpha. Through two motivating examples-one theoretical and one from practice-this article exposes limitations of these indices when the units to be rated are not well-distributed across the rating categories. Then, using a Monte Carlo simulation and information visualizations, we argue for the use of two alternative indices, the Brennan-Prediger coefficient and Gwet's AC2, because the agreement levels reported by these indices are more robust to variation in the distribution of units that raters encounter. The article concludes by exploring the complex, interwoven relationship between the number of levels in a rating instrument, the agreement level present among raters, and the distribution of units that are to be scored. Supplementary materials for this article are available online.
引用
收藏
页码:373 / 384
页数:12
相关论文
共 23 条
[1]  
[Anonymous], 2012, Content analysis
[2]  
[Anonymous], 2014, HDB INTERRATER RELIA
[3]  
[Anonymous], 1991, PROF STAND TEACH MAT
[4]  
[Anonymous], 2000, PRINC STAND SCH MATH
[5]  
[Anonymous], 2014, R LANG ENV STAT COMP
[6]   Substantial Agreement of Referee Recommendations at a General Medical Journal - A Peer Review Evaluation at Deutsches Arzteblatt International [J].
Baethge, Christopher ;
Franklin, Jeremy ;
Mertens, Stephan .
PLOS ONE, 2013, 8 (05)
[7]   COEFFICIENT KAPPA - SOME USES, MISUSES, AND ALTERNATIVES [J].
BRENNAN, RL ;
PREDIGER, DJ .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1981, 41 (03) :687-699
[8]  
Carifio J., 2007, J Soc Sci, V3, P106, DOI [10.3844/jssp.2007.106.116, DOI 10.3844/JSSP.2007.106.116]
[9]   HIGH AGREEMENT BUT LOW KAPPA .2. RESOLVING THE PARADOXES [J].
CICCHETTI, DV ;
FEINSTEIN, AR .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) :551-558
[10]   INTEGRATION AND GENERALIZATION OF KAPPAS FOR MULTIPLE RATERS [J].
CONGER, AJ .
PSYCHOLOGICAL BULLETIN, 1980, 88 (02) :322-328