Dimension reduction with redundant gene elimination for tumor classification

被引:15
作者
Zeng, Xue-Qiang [1 ]
Li, Guo-Zheng [1 ,2 ]
Yang, Jack Y. [3 ]
Yang, Mary Qu [4 ]
Wu, Geng-Feng [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200072, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China
[3] Harvard Univ, Harvard Med Sch, Cambridge, MA 02140 USA
[4] US Dept HHS, NHGRI, NIH, Bethesda, MD 20852 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1186/1471-2105-9-S6-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set. Results: Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier. Conclusion: Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients.
引用
收藏
页数:13
相关论文
共 23 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] [Anonymous], 2002, Principal Component Analysis
  • [3] [Anonymous], 2002, KENT RIDGE BIOMEDICA
  • [4] Effective dimension reduction methods for tumor classification using gene expression data
    Antoniadis, A
    Lambert-Lacroix, S
    Leblanc, F
    [J]. BIOINFORMATICS, 2003, 19 (05) : 563 - 570
  • [5] Bhavani S, 2006, J CHEM INF MODEL, V46, P2478, DOI 10.1021/ci0601281
  • [6] BOULESTEIX AL, 2006, BRIEFINGS BIOINFORMA
  • [7] Cristianini N., 2000, Intelligent Data Analysis: An Introduction
  • [8] Dai JJ, 2006, STAT APPL GENET MOL, V5
  • [9] Approximate statistical tests for comparing supervised classification learning algorithms
    Dietterich, TG
    [J]. NEURAL COMPUTATION, 1998, 10 (07) : 1895 - 1923
  • [10] Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670