The peaking phenomenon in the presence of feature-selection

被引：60

作者：

Sima, Chao ^{[1
]}

Dougherty, Edward R. ^{[1
,2
]}

机构：

[1] Translat Genom Res Inst, Computat Biol Div, Phoenix, AZ 85004 USA

[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA

来源：

PATTERN RECOGNITION LETTERS | 2008年 / 29卷 / 11期

基金：

美国国家科学基金会;

关键词：

classification; feature-selection; peaking phenomenon;

D O I：

10.1016/j.patrec.2008.04.010

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For a fixed sample size, a common phenomenon is that the error of a designed classifier decreases and then increases as the number of features grows. This peaking phenomenon has been recognized for forty years and depends on the classification rule and feature-label distribution. Historically, the peaking phenomenon has been treated by assuming a fixed Ordering of the features, usually beginning with the strongest individual feature and proceeding with features of decreasing individual classification capability. This does not take into account feature-selection, which is commonplace in high-dimensional and small sample settings. This paper revisits the peaking phenomenon in the presence of feature-selection. Using massive simulation in a high-performance computing environment, the paper considers various combinations of feature-label models, feature-selection algorithms, and classifier models to produce a large library of error versus feature size curves. Owing to the prevalence of feature-selection in genomic classification, we also consider gene-expression-based classification of breast-cancer patient prognosis. Results vary widely and are strongly dependent on the combination. The error curves tend to fall into three categories: peaking, settling into a plateau, or falling very slowly over a long range of feature set sizes. It can be concluded that one should be wary of applying peaking results found in the absence of feature-selection to settings in which feature-selection is employed. (c) 2008 Elsevier B.V. All rights reserved.

引用

页码：1667 / 1674

页数：8

共 23 条

[1] Selection bias in gene extraction on the basis of microarray gene-expression data
Ambroise, C
McLachlan, GJ
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
[2] [Anonymous], APPL STAT
[3] Is cross-validation valid for small-sample microarray classification?
Braga-Neto, UM
Dougherty, ER
[J]. BIOINFORMATICS, 2004, 20 (03) : 374 - 380
[4] POSSIBLE ORDERINGS IN MEASUREMENT SELECTION PROBLEM
COVER, TM
VANCAMPENHOUT, JM
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1977, 7 (09): : 657 - 661
[5] Hall MA, 1998, Correlation-based feature subset selection for machine learning
[6] Determination of the optimal number of features for quadratic discriminant analysis via the normal approximation to the discriminant distribution
Hua, JP
Xiong, ZX
Dougherty, ER
[J]. PATTERN RECOGNITION, 2005, 38 (03) : 403 - 421
[7] Optimal number of features as a function of sample size for various classification rules
Hua, JP
Xiong, ZX
Lowey, J
Suh, E
Dougherty, ER
[J]. BIOINFORMATICS, 2005, 21 (08) : 1509 - 1515
[8] ON MEAN ACCURACY OF STATISTICAL PATTERN RECOGNIZERS
HUGHES, GF
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (01) : 55 - +
[9] OPTIMAL NUMBER OF FEATURES IN THE CLASSIFICATION OF MULTIVARIATE GAUSSIAN DATA
JAIN, AK
WALLER, WG
[J]. PATTERN RECOGNITION, 1978, 10 (5-6) : 365 - 374
[10] Kira K., 1992, P 9 INT WORKSH MACH, P249, DOI DOI 10.1016/B978-1-55860-247-2.50037-1

← 1 2 3 →