Interrater agreement and interrater reliability: Key concepts, approaches, and applications

被引:543
作者
Gisev, Natasa [1 ]
Bell, J. Simon [1 ,2 ,3 ]
Chen, Timothy F. [1 ]
机构
[1] Univ Sydney, Fac Pharm, Sydney, NSW 2006, Australia
[2] Univ S Australia, Sch Pharm & Med Sci, Sansom Inst, Qual Use Med & Pharm Res Ctr, Adelaide, SA 5001, Australia
[3] Univ Eastern Finland, Fac Hlth Sci, Sch Pharm, Kuopio 7011, Finland
关键词
Health services research; Research design; Reproducibility of results; Observer variation; WEIGHTED KAPPA; PHARMACY; COEFFICIENT; VALIDATION; CRITERIA;
D O I
10.1016/j.sapharm.2012.04.004
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Evaluations of interrater agreement and interrater reliability can be applied to a number of different contexts and are frequently encountered in social and administrative pharmacy research. The objectives of this study were to highlight key differences between interrater agreement and interrater reliability; describe the key concepts and approaches to evaluating interrater agreement and interrater reliability; and provide examples of their applications to research in the field of social and administrative pharmacy. This is a descriptive review of interrater agreement and interrater reliability indices. It outlines the practical applications and interpretation of these indices in social and administrative pharmacy research. Interrater agreement indices assess the extent to which the responses of 2 or more independent raters are concordant. Interrater reliability indices assess the extent to which raters consistently distinguish between different responses. A number of indices exist, and some common examples include Kappa, the Kendall coefficient of concordance, Bland-Altman plots, and the intraclass correlation coefficient. Guidance on the selection of an appropriate index is provided. In conclusion, selection of an appropriate index to evaluate interrater agreement or interrater reliability is dependent on a number of factors including the context in which the study is being undertaken, the type of variable under consideration, and the number of raters making assessments. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:330 / 338
页数:9
相关论文
共 40 条
  • [1] [Anonymous], 2008, HLTH MEASUREMENT SCA, DOI DOI 10.1093/ACPROF:OSO/9780199231881.001.0001
  • [2] [Anonymous], 2000, HDB PARAMETRIC NONPA
  • [3] [Anonymous], 2003, Statistical Methods for Rates and Proportions
  • [4] Inter-expert agreement of seven criteria in causality assessment of adverse drug reactions
    Arimone, Yannick
    Miremont-Salame, Ghada
    Haramburu, Francoise
    Molimard, Mathieu
    Moore, Nicholas
    Fourrier-Reglat, Annie
    Begaud, Bernard
    [J]. BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2007, 64 (04) : 482 - 488
  • [5] Bland JM, 1999, STAT METHODS MED RES, V8, P135, DOI 10.1177/096228029900800204
  • [6] STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT
    BLAND, JM
    ALTMAN, DG
    [J]. LANCET, 1986, 1 (8476) : 307 - 310
  • [7] BIAS, PREVALENCE AND KAPPA
    BYRT, T
    BISHOP, J
    CARLIN, JB
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 1993, 46 (05) : 423 - 429
  • [8] TESTING THE NORMAL APPROXIMATION AND MINIMAL SAMPLE-SIZE REQUIREMENTS OF WEIGHTED KAPPA WHEN THE NUMBER OF CATEGORIES IS LARGE
    CICCHETTI, DV
    [J]. APPLIED PSYCHOLOGICAL MEASUREMENT, 1981, 5 (01) : 101 - 104
  • [9] HIGH AGREEMENT BUT LOW KAPPA .2. RESOLVING THE PARADOXES
    CICCHETTI, DV
    FEINSTEIN, AR
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) : 551 - 558