Permutation Tests for Studying Classifier Performance

被引：89

作者：

Ojala, Markus ^{[1
]}

Garriga, Gemma C. ^{[1
]}

机构：

[1] Aalto Univ, Dept Informat & Comp Sci, HIIT, FIN-02150 Espoo, Finland

来源：

2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2009年

关键词：

classification; labeled data; permutation tests; restricted randomization; significance testing; CROSS-VALIDATION;

D O I：

10.1109/ICDM.2009.108

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We explore the framework of permutation-based p-values for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classification problems in computational biology. The second test produces permutations of the features within classes, inspired by restricted randomization techniques traditionally used in statistics. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classification error via permutation tests is effective; in particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data.

引用

页码：908 / 913

页数：6

共 14 条

[1]

[Anonymous], Data Mining Practical Machine Learning Tools and Techniques with Java

[2]

Asuncion A., UCI MACHINE LEARNING

[3] Is cross-validation valid for small-sample microarray classification? [J].

Braga-Neto, UM ;

Dougherty, ER .

BIOINFORMATICS, 2004, 20 (03) :374-380

[4] 1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].

EFRON, B .

ANNALS OF STATISTICS, 1979, 7 (01) :1-26

[5]

Frank E., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P152

[6]

GIONIS A, 2007, ACM TKDD, V1

[7]

GOLLAND P, 2005, COLT, P501

[8]

Good PI., 2000, Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses, V2

[9] Relation between permutation-test P values and classifier error estimates [J].

Hsing, T ;

Attoor, S ;

Dougherty, E .

MACHINE LEARNING, 2003, 52 (1-2) :11-30

[10] Cross-validation and bootstrapping are unreliable in small sample classification [J].

Isaksson, A. ;

Wallman, M. ;

Goransson, H. ;

Gustafsson, M. G. .

PATTERN RECOGNITION LETTERS, 2008, 29 (14) :1960-1965

← 1 2 →