A comparison of AUC estimators in small-sample studies

被引:0
作者
Airola, Antti [1 ]
Pahikkala, Tapio [1 ]
Waegeman, Willem [2 ]
De Baets, Bernard [2 ]
Salakoski, Tapio [1 ]
机构
[1] Univ Turku, Dept Informat Technol, Turku Ctr Comp Sci TUCS, Joukahaisenkatu 3-5 B, Turku, Finland
[2] Univ Ghent, KERMIT, Dept Appl Math Biometr & Control, Ghent, Belgium
来源
PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN SYSTEMS BIOLOGY | 2010年 / 8卷
关键词
Area Under the ROC Curve; Classifier Performance Estimation; Conditional AUC Estimation; Cross-validation; Leave-pair-out Cross-validation; CLASSIFICATION; ACCURACY; MACHINE; KERNEL; AREA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.
引用
收藏
页码:3 / 13
页数:11
相关论文
共 28 条
[1]  
Agarwal S, 2005, J MACH LEARN RES, V6, P393
[2]   All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning [J].
Airola, Antti ;
Pyysalo, Sampo ;
Bjoerne, Jari ;
Pahikkala, Tapio ;
Ginter, Filip ;
Salakoski, Tapio .
BMC BIOINFORMATICS, 2008, 9 (Suppl 11)
[3]   Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression [J].
An, Senjian ;
Liu, Wanquan ;
Venkatesh, Svetha .
PATTERN RECOGNITION, 2007, 40 (08) :2154-2162
[4]  
[Anonymous], JMLR
[5]  
[Anonymous], 1979, Estimation of Dependences Based on Empirical Data
[6]   Identifying genes that contribute most to good classification in microarrays [J].
Baker, Stuart G. ;
Kramer, Barnett S. .
BMC BIOINFORMATICS, 2006, 7 (1)
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[9]  
CORTES C, 2007, ACM INT C P SERIES, V227, P169
[10]  
Cortes C, 2007, LECT NOTES COMPUT SC, V4525, P1