Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification

被引：405

作者：

Foody, Giles M. ^{[1
]}

机构：

[1] Univ Nottingham, Sch Geog, Nottingham NG7 2RD, England

来源：

REMOTE SENSING OF ENVIRONMENT | 2020年 / 239卷

关键词：

Accuracy; Kappa coefficient; Chance; Prevalence; Bias; STANDARD ERRORS; HIGH AGREEMENT; PREVALENCE; MODELS; RELIABILITY; INDEX; AREA; BIAS;

D O I：

10.1016/j.rse.2019.111630

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

The kappa coefficient is not an index of accuracy, indeed it is not an index of overall agreement but one of agreement beyond chance. Chance agreement is, however, irrelevant in an accuracy assessment and is anyway inappropriately modelled in the calculation of a kappa coefficient for typical remote sensing applications. The magnitude of a kappa coefficient is also difficult to interpret. Values that span the full range of widely used interpretation scales, indicating a level of agreement that equates to that estimated to arise from chance alone all the way through to almost perfect agreement, can be obtained from classifications that satisfy demanding accuracy targets (e.g. for a classification with overall accuracy of 95% the range of possible values of the kappa coefficient is -0.026 to 0.900). Comparisons of kappa coefficients are particularly challenging if the classes vary in their abundance (i.e. prevalence) as the magnitude of a kappa coefficient reflects not only agreement in labelling but also properties of the populations under study. It is shown that all of the arguments put forward for the use of the kappa coefficient in accuracy assessment are flawed and/or irrelevant as they apply equally to other, sometimes easier to calculate, measures of accuracy. Calls for the kappa coefficient to be abandoned from accuracy assessments should finally be heeded and researchers are encouraged to provide a set of simple measures and associated outputs such as estimates of per-class accuracy and the confusion matrix when assessing and comparing classification accuracy.

引用

页数：11

共 66 条

[1] Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS) [J].

Allouche, Omri ;

Tsoar, Asaf ;

Kadmon, Ronen .

JOURNAL OF APPLIED ECOLOGY, 2006, 43 (06) :1223-1232

[2]

Anderson J R., 1976, Professional Paper

[3]

ANDERSON JR, 1971, PHOTOGRAMM ENG, V37, P379

[4] Thematic Accuracy Quality Control by Means of a Set of Multinomials [J].

Ariza-Lopez, Francisco J. ;

Rodriguez-Avi, Jose ;

Alba-Fernandez, Maria, V ;

Garcia-Balboa, Jose L. .

APPLIED SCIENCES-BASEL, 2019, 9 (20)

[5] Beyond kappa: A review of interrater agreement measures [J].

Banerjee, M .

CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1999, 27 (01) :3-23

[6] COEFFICIENT KAPPA - SOME USES, MISUSES, AND ALTERNATIVES [J].

BRENNAN, RL ;

PREDIGER, DJ .

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1981, 41 (03) :687-699

[7] BIAS, PREVALENCE AND KAPPA [J].

BYRT, T ;

BISHOP, J ;

CARLIN, JB .

JOURNAL OF CLINICAL EPIDEMIOLOGY, 1993, 46 (05) :423-429

[8] HIGH AGREEMENT BUT LOW KAPPA .2. RESOLVING THE PARADOXES [J].

CICCHETTI, DV ;

FEINSTEIN, AR .

JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) :551-558

[9] A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].

COHEN, J .

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46

[10] Spatial analysis of remote sensing image classification accuracy [J].

Comber, Alexis ;

Fisher, Peter ;

Brunsdon, Chris ;

Khmag, Abdulhakim .

REMOTE SENSING OF ENVIRONMENT, 2012, 127 :237-246

← 1 2 3 4 5 6 7 →