Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method

被引:38
作者
Lee, Chien-Pang [1 ]
Lin, Wen-Shin [1 ]
Chen, Yuh-Min [2 ]
Kuo, Bo-Jein [1 ]
机构
[1] Natl Chung Hsing Univ, Dept Agron, Div Biometry, Taichung 40227, Taiwan
[2] China Med Univ, Sch Nursing, Taichung 40402, Taiwan
关键词
Gene selection; Sample classification; Adaptive genetic algorithm; k-Nearest neighbor; Microarray data; EXPRESSION DATA; PREDICTION; PARAMETER; CROSSOVER; CHOICE;
D O I
10.1016/j.eswa.2010.07.053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, microarray technology has widely used on the study of gene expression in cancer diagnosis. The main distinguishing feature of microarray technology is that can measure thousands of genes at the same time. In the past, researchers always used parametric statistical methods to find the significant genes. However, microarray data often cannot obey some of the assumptions of parametric statistical methods, or type I error may be over expanded. Therefore, our aim is to establish a gene selection method without assumption restriction to reduce the dimension of the data set. In our study, adaptive genetic algorithm/k-nearest neighbor (AGA/KNN) was used to evolve gene subsets. We find that AGA/KNN can reduce the dimension of the data set, and all test samples can be classified correctly. In addition, the accuracy of AGA/KNN is higher than that of GA/KNN, and it only takes half the CPU time of GA/KNN. After using the proposed method, biologists can identify the relevant genes efficiently from the sub-gene set and classify the test samples correctly. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4661 / 4667
页数:7
相关论文
共 23 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [3] Microarray expression profiling identifies genes with altered expression in HDL-deficient mice
    Callow, MJ
    Dudoit, S
    Gong, EL
    Speed, TP
    Rubin, EM
    [J]. GENOME RESEARCH, 2000, 10 (12) : 2022 - 2029
  • [4] DARDEN L, 2001, COMB CHEM HIGH T SCR, V4, P727
  • [5] De Jong K. A., 1975, ANAL BEHAV CLASS GEN
  • [6] Dudoit S, 2002, STAT SINICA, V12, P111
  • [7] CHOICE OF THE SMOOTHING PARAMETER AND EFFICIENCY OF K-NEAREST NEIGHBOR CLASSIFICATION
    ENAS, GG
    CHOI, SC
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS-PART A, 1986, 12 (02): : 235 - 244
  • [8] Golberg D. E., 1989, GENETIC ALGORITHMS S, V1989, P36
  • [9] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [10] Hernandez JCH, 2008, LECT NOTES COMPUT SC, V4926, P243