Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data

被引:6
作者
Abedini, Mani [1 ,2 ]
Kirley, Michael [1 ]
Chiong, Raymond [3 ,4 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia
[2] IBM Res Australia, Level 5-204,Lygon St, Carlton, Vic 3053, Australia
[3] Univ Newcastle, Fac Sci & IT, Sch DCIT, Callaghan, NSW 2308, Australia
[4] Swinburne Univ Technol, Fac Higher Educ, Lilydale, Vic 3140, Australia
关键词
Classification; high-dimensional data; feature ranking; microarray gene expression profiling; eXtended Classifier System; XCS; GRD-XCS; guided rule discovery XCS; evolutionary algorithms;
D O I
10.4066/AMJ.2013.1641
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background DNA microarray gene expression classification poses a challenging task to the machine learning domain. Typically, the dimensionality of gene expression data sets could go from several thousands to over 10,000 genes. A potential solution to this issue is using feature selection to reduce the dimensionality. Aim The aim of this paper is to investigate how we can use feature quality information to improve the precision of microarray gene expression classification tasks. Method We propose two evolutionary machine learning models based on the eXtended Classifier System (XCS) and a typical feature selection methodology. The first one, which we call FS-XCS, uses feature selection for feature reduction purposes. The second model is GRD-XCS, which uses feature ranking to bias the rule discovery process of XCS. Results The results indicate that the use of feature selection/ ranking methods is essential for tackling highdimensional classification tasks, such as microarray gene expression classification. However, the results also suggest that using feature ranking to bias the rule discovery process performs significantly better than using the feature reduction method. In other words, using feature quality information to develop a smarter learning procedure is more efficient than reducing the feature set. Conclusion Our findings have shown that extracting feature quality information can assist the learning process and improve classification accuracy. On the other hand, relying exclusively on the feature quality information might potentially decrease the classification performance (e.g., using feature reduction). Therefore, we recommend a hybrid approach that uses feature quality information to direct the learning process by highlighting the more informative features, but at the same time not restricting the learning process to explore other features.
引用
收藏
页码:272 / 279
页数:8
相关论文
共 23 条
[1]  
Abedini M, 2011, LECT NOTES ARTIF INT, V7106, P1, DOI 10.1007/978-3-642-25832-9_1
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
Blum C, 2012, VARIANTS OF EVOLUTIONARY ALGORITHMS FOR REAL-WORLD APPLICATIONS, P1
[4]  
Huerta EB, 2010, LECT NOTES COMPUT SC, V6256, P250, DOI 10.1007/978-3-642-15992-3_27
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   An algorithmic description of XCS [J].
M. V. Butz ;
S. W. Wilson .
Soft Computing, 2002, 6 (3) :144-153
[7]   Automated global structure extraction for effective local building block processing in XCS [J].
Butz, Martin V. ;
Pelikan, Martin ;
Llora, Xavier ;
Goldberg, David E. .
EVOLUTIONARY COMPUTATION, 2006, 14 (03) :345-380
[8]   Analysis and improvement of fitness exploitation in XCS: Bounding models, tournament selection, and bilateral accuracy [J].
Butz, MV ;
Goldberg, DE ;
Tharakunnel, K .
EVOLUTIONARY COMPUTATION, 2003, 11 (03) :239-277
[9]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[10]  
Hassan MR, 2008, LECT NOTES ARTIF INT, V5211, P489, DOI 10.1007/978-3-540-87479-9_50