Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

被引:13
作者
Irsoy, Ozan [1 ]
Yildiz, Olcay Taner [2 ]
Alpaydin, Ethem [1 ]
机构
[1] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
[2] Isik Univ, Dept Comp Engn, TR-34980 Istanbul, Turkey
关键词
Statistical tests; classification; model selection; STATISTICAL COMPARISONS; PRECISION; CURVES; RECALL; AREA;
D O I
10.1109/TCBB.2012.117
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In many bioinformatics applications, it is important to assess and compare the performances of algorithms trained from data, to be able to draw conclusions unaffected by chance and are therefore significant. Both the design of such experiments and the analysis of the resulting data using statistical tests should be done carefully for the results to carry significance. In this paper, we first review the performance measures used in classification, the basics of experiment design and statistical tests. We then give the results of our survey over 1,500 papers published in the last two years in three bioinformatics journals (including this one). Although the basics of experiment design are well understood, such as resampling instead of using a single training set and the use of different performance metrics instead of error, only 21 percent of the papers use any statistical test for comparison. In the third part, we analyze four different scenarios which we encounter frequently in the bioinformatics literature, discussing the proper statistical methodology as well as showing an example case study for each. With the supplementary software, we hope that the guidelines we discuss will play an important role in future studies.
引用
收藏
页码:1663 / 1675
页数:13
相关论文
共 51 条
[1]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[2]  
Alpaydin E., 2010, Introduction to Machine Learning, V2
[3]  
Bengio SamyBengio., 2005, Int. Conf. on Machine Learning, P9
[4]  
Bergmann Beate, 1988, Multiple hypotheses testing, P100
[5]  
Bloedorn E, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P433
[6]  
Bouckaert Remco R., 2004, P 21 INT C MACHINE L, P15, DOI DOI 10.1145/1015330.1015338
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models [J].
Bravo, Hector Corrada ;
Lee, Kristine E. ;
Klein, Barbara E. K. ;
Klein, Ronald ;
Iyengar, Sudha K. ;
Wahba, Grace .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (20) :8128-8133
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]  
Clemencon Stephan, 2009, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, P185