Tabu Search and Binary Particle Swarm Optimization for Feature Selection Using Microarray Data

被引:47
作者
Chuang, Li-Yeh [2 ]
Yang, Cheng-Huei [3 ]
Yang, Cheng-Hong [1 ,4 ]
机构
[1] Natl Kaohsiung Univ Appl Sci, Dept Elect Engn, 415 Chien Kung Rd, Kaohsiung 807, Taiwan
[2] I Shou Univ, Dept Chem Engn, Kaohsiung, Taiwan
[3] Natl Kaohsiung Marine Univ, Dept Elect Commun Engn, Kaohsiung, Taiwan
[4] Toko Univ, Dept Network Syst, Chiayi, Taiwan
关键词
feature selection; K-nearest neighbor; leave-one-out cross-validation; particle swarm optimization; support vector machines; tabu search; GENE SELECTION; MOLECULAR CLASSIFICATION; CANCER; PREDICTION; ALGORITHMS; CARCINOMAS;
D O I
10.1089/cmb.2007.0211
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.
引用
收藏
页码:1689 / 1703
页数:15
相关论文
共 56 条
  • [1] Regularized Least Squares cancer classifiers from DNA microarray data
    Ancona, N
    Maglietta, R
    D'Addabbo, A
    Liuni, S
    Pesole, G
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 4)
  • [2] [Anonymous], 1991, NEAREST NEIGHB NORMS
  • [3] [Anonymous], [No title captured]
  • [4] [Anonymous], 1998, Machine Learning Proceedings of the Fifteenth International Conference (ICML '98)
  • [5] MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia
    Armstrong, SA
    Staunton, JE
    Silverman, LB
    Pieters, R
    de Boer, ML
    Minden, MD
    Sallan, SE
    Lander, ES
    Golub, TR
    Korsmeyer, SJ
    [J]. NATURE GENETICS, 2002, 30 (01) : 41 - 47
  • [6] USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING
    BATTITI, R
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04): : 537 - 550
  • [7] Instance-based concept learning from multiclass DNA microarray data
    Berrar, D
    Bradbury, I
    Dubitzky, W
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [8] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
  • [9] On the neutralino as dark matter candidate. II. Direct detection
    Bottino, A.
    de Alfaro, V.
    Fornengo, N.
    Mignola, G.
    Scopel, S.
    [J]. ASTROPARTICLE PHYSICS, 1994, 2 (01) : 77 - 90
  • [10] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267