Substantial Agreement of Referee Recommendations at a General Medical Journal - A Peer Review Evaluation at Deutsches Arzteblatt International

被引：22

作者：

Baethge, Christopher ^{[1
,2
]}

Franklin, Jeremy ^{[3
]}

Mertens, Stephan ^{[1
]}

机构：

[1] Deutsch Arzteblatt Int, Editorial Off, Cologne, Germany

[2] Univ Cologne, Sch Med, Dept Psychiat & Psychotherapy, D-50931 Cologne, Germany

[3] Univ Cologne, Sch Med, Inst Med Stat, D-50931 Cologne, Germany

来源：

PLOS ONE | 2013年 / 8卷 / 05期

关键词：

LOW KAPPA; PARADOXES;

D O I：

10.1371/journal.pone.0061401

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background: Peer review is the mainstay of editorial decision making for medical journals. There is a dearth of evaluations of journal peer review with regard to reliability and validity, particularly in the light of the wide variety of medical journals. Studies carried out so far indicate low agreement among reviewers. We present an analysis of the peer review process at a general medical journal, Deutsches Arzteblatt International. Methodology/Principal Findings: 554 reviewer recommendations on 206 manuscripts submitted between 7/2008 and 12/2009 were analyzed: 7% recommended acceptance, 74% revision and 19% rejection. Concerning acceptance (with or without revision) versus rejection, there was a substantial agreement among reviewers (74.3% of pairs of recommendations) that was not reflected by Fleiss' or Cohen's kappa (<0.2). The agreement rate amounted to 84% for acceptance, but was only 31% for rejection. An alternative kappa-statistic, however, Gwet's kappa (AC1), indicated substantial agreement (0.63). Concordance between reviewer recommendation and editorial decision was almost perfect when reviewer recommendations were unanimous. The correlation of reviewer recommendations and citations as counted by Web of Science was low (partial correlation adjusted for year of publication: -0.03, n.s.). Conclusions/Significance: Although our figures are similar to those reported in the literature our conclusion differs from the widely held view that reviewer agreement is low: Based on overall agreement we consider the concordance among reviewers sufficient for the purposes of editorial decision making. We believe that various measures, such as positive and negative agreement or alternative Kappa values are superior to the application of Cohen's or Fleiss' Kappa in the analysis of nominal or ordinal level data regarding reviewer agreement. Also, reviewer recommendations seem to be a poor proxy for citations because, for example, manuscripts will be changed considerably during the revision process.

引用

页数：7

共 20 条

[1]

Adler R., 2008, Citation Statistics: A report from the International Mathematical Union (IMU) in cooperation with the International Council of Industrial and Applied Mathematics (ICIAM) and the Institute of Mathematical Statistics

[2]

[Anonymous], 2002, STAT METHODS INTERRA

[3] Impact Factor-a Useful Tool, but Not for All Purposes [J].

Baethge, Christopher .

DEUTSCHES ARZTEBLATT INTERNATIONAL, 2012, 109 (15) :267-269

[4] A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants [J].

Bornmann, Lutz ;

Mutz, Ruediger ;

Daniel, Hans-Dieter .

PLOS ONE, 2010, 5 (12)

[5] Reviewer and editor biases in journal peer review: an investigation of manuscript refereeing at Angewandte Chemie International Edition [J].

Bornmann, Lutz ;

Daniel, Hans-Dieter .

RESEARCH EVALUATION, 2009, 18 (04) :262-272

[6] THE EVOLUTION OF EDITORIAL PEER-REVIEW [J].

BURNHAM, JC .

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1990, 263 (10) :1323-1329

[7] HIGH AGREEMENT BUT LOW KAPPA .2. RESOLVING THE PARADOXES [J].

CICCHETTI, DV ;

FEINSTEIN, AR .

JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) :551-558

[8] HIGH AGREEMENT BUT LOW KAPPA .1. THE PROBLEMS OF 2 PARADOXES [J].

FEINSTEIN, AR ;

CICCHETTI, DV .

JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) :543-549

[9]

Fleiss JL., 1981, STAT METHODS RATES P

[10] Computing inter-rater reliability and its variance in the presence of high agreement [J].

Gwet, Kilem Li .

BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2008, 61 :29-48

← 1 2 →