Gene selection and classification using Taguchi chaotic binary particle swarm optimization

被引:64
作者
Chuang, Li-Yeh [2 ]
Yang, Cheng-San [1 ]
Wu, Kuo-Chuan [3 ]
Yang, Cheng-Hong [4 ,5 ]
机构
[1] Chia Yi Christian Hosp, Dept Plast Surg, Chiayi 60002, Taiwan
[2] I Shou Univ, Inst Biotechnol & Chem Engn, Kaohsiung 80041, Taiwan
[3] Natl Kaohsiung Univ Appl Sci, Dept Comp Sci & Informat Engn, Kaohsiung 80708, Taiwan
[4] Toko Univ, Dept Network Syst, Chiayi 61363, Taiwan
[5] Natl Kaohsiung Univ Appl Sci, Dept Elect Engn, Kaohsiung 80708, Taiwan
关键词
Microarray data; Correlation-based feature selection; Taguchi-binary particle swarm optimization; K-nearest neighbor; MICROARRAY DATA; CANCER CLASSIFICATION; ALGORITHM; PREDICTION; COMBINATION; PATTERNS; CHOICE; FILTER; TUMOR;
D O I
10.1016/j.eswa.2011.04.165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. Microarray data with reference to gene expression profiles have provided some valuable results related to a variety of problems and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and a small sample size. This makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to analyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate. In this paper, correlation-based feature selection (CFS) and the Taguchi chaotic binary particle swarm optimization (TCBPSO) were combined into a hybrid method. The K-nearest neighbor (K-NN) with leave-one-out cross-validation (LOOCV) method served as a classifier for ten gene expression profiles. Experimental results show that this hybrid method effectively simplifies features selection by reducing the number of features needed. The classification error rate obtained by the proposed method had the lowest classification error rate for all of the ten gene expression data set problems tested. For six of the gene expression profile data sets a classification error rate of zero could be reached. The introduced method outperformed five other methods from the literature in terms of classification error rate. It could thus constitute a valuable tool for gene expression analysis in future studies. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13367 / 13377
页数:11
相关论文
共 65 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], 1991, Artificial Intelligence
[4]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[5]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[6]   Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers [J].
Cawley, GC ;
Talbot, NLC .
PATTERN RECOGNITION, 2003, 36 (11) :2585-2592
[7]   Data mining and Taguchi method combination applied to the selection of discharge factors and the best interactive factor combination under multiple quality properties [J].
Chang, Ting-Cheng ;
Tsai, Feng-Che ;
Ke, Jiuan-Hung .
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2006, 31 (1-2) :164-174
[8]   A neural network-based approach for dynamic quality prediction in a plastic injection molding process [J].
Chen, Wen-Chin ;
Tai, Pei-Hao ;
Wang, Min-Wen ;
Deng, Wei-Jaw ;
Chen, Chen-Tai .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) :843-849
[9]   Improved binary PSO for feature selection using gene expression data [J].
Chuang, Li-Yeh ;
Chang, Hsueh-Wei ;
Tu, Chung-Jui ;
Yang, Cheng-Hong .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2008, 32 (01) :29-38
[10]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+