Feature Selection for Gene Expression Using Model-Based Entropy

被引:62
作者
Zhu, Shenghuo [1 ]
Wang, Dingding [2 ]
Yu, Kai [1 ]
Li, Tao [2 ]
Gong, Yihong [1 ]
机构
[1] NEC Labs Amer, Cupertino, CA 95014 USA
[2] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
基金
美国国家科学基金会;
关键词
Feature selection; multivariate Gaussian generative model; entropy; CLASSIFICATION; INFORMATION; PREDICTION;
D O I
10.1109/TCBB.2008.35
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Gene expression data usually contain a large number of genes but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Using machine learning techniques, traditional gene selection based on empirical mutual information suffers the data sparseness issue due to the small number of samples. To overcome the sparseness issue, we propose a model-based approach to estimate the entropy of class variables on the model, instead of on the data themselves. Here, we use multivariate normal distributions to fit the data, because multivariate normal distributions have maximum entropy among all real-valued distributions with a specified mean and standard deviation and are widely used to approximate various distributions. Given that the data follow a multivariate normal distribution, since the conditional distribution of class variables given the selected features is a normal distribution, its entropy can be computed with the log-determinant of its covariance matrix. Because of the large number of genes, the computation of all possible log-determinants is not efficient. We propose several algorithms to largely reduce the computational cost. The experiments on seven gene data sets and the comparison with other five approaches show the accuracy of the multivariate Gaussian generative model for feature selection, and the efficiency of our algorithms.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 36 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] [Anonymous], 1991, ELEMENTS INFORM THEO, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]
  • [3] Bishop C., 2006, PATTERN RECOGN, DOI DOI 10.1117/1.2819119
  • [4] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [5] Accessing genetic information with high-density DNA arrays
    Chee, M
    Yang, R
    Hubbell, E
    Berno, A
    Huang, XC
    Stern, D
    Winkler, J
    Lockhart, DJ
    Morris, MS
    Fodor, SPA
    [J]. SCIENCE, 1996, 274 (5287) : 610 - 614
  • [6] BEST 2 INDEPENDENT MEASUREMENTS ARE NOT 2 BEST
    COVER, TM
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, SMC4 (01): : 116 - 117
  • [7] Comparison of discrimination methods for the classification of tumors using gene expression data
    Dudoit, S
    Fridlyand, J
    Speed, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 77 - 87
  • [8] Fedorov V., 1972, Theory of Optimal Experiment
  • [9] LIGHT-DIRECTED, SPATIALLY ADDRESSABLE PARALLEL CHEMICAL SYNTHESIS
    FODOR, SPA
    READ, JL
    PIRRUNG, MC
    STRYER, L
    LU, AT
    SOLAS, D
    [J]. SCIENCE, 1991, 251 (4995) : 767 - 773
  • [10] Gene-expression profiles in hereditary breast cancer.
    Hedenfalk, I
    Duggan, D
    Chen, YD
    Radmacher, M
    Bittner, M
    Simon, R
    Meltzer, P
    Gusterson, B
    Esteller, M
    Kallioniemi, OP
    Wilfond, B
    Borg, Å
    Trent, J
    Raffeld, M
    Yakhini, Z
    Ben-Dor, A
    Dougherty, E
    Kononen, J
    Bubendorf, L
    Fehrle, W
    Pittaluga, S
    Gruvberger, S
    Loman, N
    Johannsoson, O
    Olsson, H
    Sauter, G
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2001, 344 (08) : 539 - 548