Feature selection for multi-class problems by using pairwise-class and all-class techniques

被引:13
作者
You, Mingyu [1 ,2 ]
Li, Guo-Zheng [1 ]
机构
[1] Tongji Univ, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China
关键词
feature selection; pairwise-class; all-class; round-robin; microarray data analysis; GENE SELECTION; CLASSIFICATION;
D O I
10.1080/03081079.2010.530027
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection has been a key technology in massive data processing, e. g. in microarray data analysis with few samples but high-dimensional genes. One common problem in multi-class microarray data analysis is the unbalanced recognition or prediction accuracies among classes, which usually leads to poor system performance. One of the main reasons is the unfair feature (gene) selection method. In this paper, a novel feature selection framework by using pairwise-class and all-class techniques (namely FrPA) is proposed to balance the performance among classes and improve the average accuracy. The feature (gene) rank list on all classes and the lists on each pair of classes are all taken into consideration during feature selection. The strategy of round-robin is embedded into the framework to select final features from the different rank lists. Experimental results on several microarray data sets show that FrPA helps to achieve higher classification accuracy and balance the performance among classes.
引用
收藏
页码:381 / 394
页数:14
相关论文
共 17 条
[1]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[2]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[3]   Selecting dissimilar genes for multi-class classification, an application in cancer subtyping [J].
Cai, Zhipeng ;
Goebel, Randy ;
Salavatipour, Mohammad R. ;
Lin, Guohui .
BMC BIOINFORMATICS, 2007, 8 (1)
[4]  
Chen N., 2004, SUPPORT VECTOR MACHI
[5]  
Chidlovskii B, 2008, LECT NOTES ARTIF INT, V5211, P227, DOI 10.1007/978-3-540-87479-9_33
[6]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[7]  
Forman, 2004, P 21 INT C MACH LEAR, P38, DOI [DOI 10.1145/1015330.1015356, 10.1145/1015330.1015356]
[8]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[9]  
Japkowicz N., 2000, Learning from Imbalanced Data Sets. Papers from the AAAI Workshop (Technical Report WS-00-05), P10
[10]   Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks [J].
Khan, J ;
Wei, JS ;
Ringnér, M ;
Saal, LH ;
Ladanyi, M ;
Westermann, F ;
Berthold, F ;
Schwab, M ;
Antonescu, CR ;
Peterson, C ;
Meltzer, PS .
NATURE MEDICINE, 2001, 7 (06) :673-679