Gene selection for cancer classification using support vector machines

被引:6372
|
作者
Guyon, I [1 ]
Weston, J
Barnhill, S
Vapnik, V
机构
[1] Barnhill Bioinformat, Savannah, GA USA
[2] AT&T Labs Res, Red Bank, NJ 07701 USA
关键词
diagnosis; diagnostic tests; drug discovery; RNA expression; genomics; gene selection; DNA micro-array; proteomics; cancer classification; feature selection; support vector machines; recursive feature elimination;
D O I
10.1023/A:1012487302797
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues. In this paper, we address the problem of selection of a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays. Using available training examples from cancer and normal patients, we build a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer. In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.
引用
收藏
页码:389 / 422
页数:34
相关论文
共 50 条
  • [1] Gene Selection for Cancer Classification using Support Vector Machines
    Isabelle Guyon
    Jason Weston
    Stephen Barnhill
    Vladimir Vapnik
    Machine Learning, 2002, 46 : 389 - 422
  • [2] Gene selection for cancer classification using bootstrapped genetic algorithms and support vector machines
    Chen, XW
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 504 - 505
  • [3] Gene selection and prediction for cancer classification using support vector machines with a reject option
    Choi, Hosik
    Yeo, Donghwa
    Kwon, Sunghoon
    Kim, Yongdai
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (05) : 1897 - 1908
  • [4] Hybrid Firefly based Simultaneous Gene Selection and Cancer Classification using Support Vector Machines and Random Forests
    Srivastava, Atulji
    Chakrabarti, Saurabh
    Das, Subrata
    Ghosh, Shameek
    Jayaraman, V. K.
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS (BIC-TA 2012), VOL 1, 2013, 201 : 485 - +
  • [5] Saliency analysis of support vector machines for gene selection in tissue classification
    Cao, L
    Seng, CK
    Gu, Q
    Lee, HP
    NEURAL COMPUTING & APPLICATIONS, 2003, 11 (3-4): : 244 - 249
  • [6] Saliency Analysis of Support Vector Machines for Gene Selection in Tissue Classification
    L. Cao
    H.P. Lee
    C.K. Seng
    Q. Gu
    Neural Computing & Applications, 2003, 11 : 244 - 249
  • [7] Hybrid huberized support vector machines for microarray classification and gene selection
    Wang, Li
    Zhu, Ji
    Zou, Hui
    BIOINFORMATICS, 2008, 24 (03) : 412 - 419
  • [8] Gene Classification Using Codon Usage and Support Vector Machines
    Ma, Jianmin
    Nguyen, Minh N.
    Rajapakse, Jagath C.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2009, 6 (01) : 134 - 143
  • [9] Simultaneous Support Vector Selection and Parameter Optimization Using Support Vector Machines for Sentiment Classification
    Fei, Ye
    PROCEEDINGS OF 2016 IEEE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2016), 2016, : 59 - 62
  • [10] Feature selection algorithm in classification learning using support vector machines
    Yu. V. Goncharov
    I. B. Muchnik
    L. V. Shvartser
    Computational Mathematics and Mathematical Physics, 2008, 48 : 1243 - 1260