Plotting receiver operating characteristic and precision-recall curves from presence and background data

被引:19
作者
Li, Wenkai [1 ]
Guo, Qinghua [2 ]
机构
[1] Sun Yat Sen Univ, Guangdong Prov Engn Res Ctr Remote Sensing & Moni, Sch Geog & Planning, Guangzhou 510275, Peoples R China
[2] Peking Univ, Coll Urban & Environm Sci, Inst Ecol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
area under the curve; model evaluation; precision-recall curve; presence and background data; receiver operating characteristic curve; species distribution modeling; HABITAT-SUITABILITY MODELS; SPECIES DISTRIBUTION MODELS; PRESENCE-ONLY DATA; PRESENCE-ABSENCE; PREDICTION; DISTRIBUTIONS; PERFORMANCE; ACCURACY; PROBABILITY; THRESHOLDS;
D O I
10.1002/ece3.7826
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
The receiver operating characteristic (ROC) and precision-recall (PR) plots have been widely used to evaluate the performance of species distribution models. Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting the ROC/PR curves from presence-only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results. In this study, we propose a new approach to calibrate the ROC/PR curves from presence and background data with user-provided information on a constant c, namely PB approach. Here, c defines the probability that species occurrence is detected (labeled), and an estimate of c can also be derived from the PB-based ROC/PR plots given that a model with good ability of discrimination is available. We used five virtual species and a real aerial photography to test the effectiveness of the proposed PB-based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various sample sizes. The ROC/PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches. Experimental results show that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach. The PB-based ROC/PR plots also provide highly accurate estimations of c in our experiment. We conclude that the proposed PB-based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data.
引用
收藏
页码:10192 / 10206
页数:15
相关论文
共 50 条
[41]   Comparison Of Two Classifiers When The Data Sets Are Imbalanced: The Power Of The Area Under The Precision-Recall Curve As The Figure Of Merit Versus The Area Under The ROC Curve [J].
Sahiner, Berkman ;
Chen, Weijie ;
Pezeshk, Aria ;
Petrick, Nicholas .
MEDICAL IMAGING 2017: IMAGE PERCEPTION, OBSERVER PERFORMANCE, AND TECHNOLOGY ASSESSMENT, 2017, 10136
[42]   Receiver operating characteristic (ROC) movies, universal ROC (UROC) curves, and coefficient of predictive ability (CPA) [J].
Gneiting, Tilmann ;
Walz, Eva-Maria .
MACHINE LEARNING, 2022, 111 (08) :2769-2797
[43]   Statistical Analysis of Receiver Operating Characteristic (ROC) Curves for the Ratings of the A-Not A and the Same-Different Methods [J].
Bi, Jian ;
Lee, Hye-Seong ;
O'Mahony, Michael .
JOURNAL OF SENSORY STUDIES, 2013, 28 (01) :34-46
[44]   Rank statistics expressible as integrals under P-P-plots and receiver operating characteristic curves [J].
Girling, AJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2000, 62 :367-382
[45]   Estimating Screening-Mammography Receiver Operating Characteristic (ROC) Curves from Stratified Random Samples of Screening Mammograms: A Simulation Study [J].
Zur, Richard M. ;
Pesce, Lorenzo L. ;
Jiang, Yulei .
ACADEMIC RADIOLOGY, 2015, 22 (05) :580-590
[46]   Receiver operating characteristic curves of ultrasonographic estimates of fetal weight for prediction of fetal growth restriction in prolonged pregnancies [J].
O'Reilly-Green, CP ;
Divon, MY .
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 1999, 181 (05) :1133-1138
[47]   Estimating receiver operating characteristic curves with covariates when there is no perfect reference test for diagnosis of Johne's disease [J].
Wang, C. ;
Turnbull, B. W. ;
Grohn, Y. T. ;
Nielsen, S. S. .
JOURNAL OF DAIRY SCIENCE, 2006, 89 (08) :3038-3046
[48]   Test Data Reuse for the Evaluation of Continuously Evolving Classification Algorithms Using the Area under the Receiver Operating Characteristic Curve [J].
Gossmann, Alexej ;
Pezeshk, Aria ;
Wang, Yu-Ping ;
Sahiner, Berkman .
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (02) :692-714
[49]   Mixture models in diagnostic meta-analyses Clustering summary receiver operating characteristic curves accounted for heterogeneity and correlation [J].
Schlattmann, Peter ;
Verba, Maryna ;
Dewey, Marc ;
Walther, Mario .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2015, 68 (01) :61-72
[50]   Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves [J].
Wang, Yuanjia ;
Chen, Huaihou ;
Schwartz, Theresa ;
Duan, Naihua ;
Parcesepe, Angela ;
Lewis-Fernandez, Roberto .
STATISTICS IN MEDICINE, 2011, 30 (14) :1751-1760