An efficient gene selection technique for cancer recognition based on neighborhood mutual information

被引:70
作者
Hu, Qinghua [1 ]
Pan, Wei [1 ]
An, Shuang [1 ]
Ma, Peijun [1 ]
Wei, Jinmao [2 ]
机构
[1] Harbin Inst Technol, Harbin 150006, Peoples R China
[2] Nankai Univ, Tianjin 300071, Peoples R China
基金
中国国家自然科学基金;
关键词
Cancer recognition; Gene selection; Neighborhood mutual information; Maximal relevancy; Minimal redundancy; SUBSET-SELECTION; CLASSIFICATION; PREDICTION; IDENTIFICATION; MICROARRAY; RELEVANCE; SCHEME;
D O I
10.1007/s13042-010-0008-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gene selection is a key problem in gene expression based cancer recognition and related tasks. A measure, called neighborhood mutual information (NMI), is introduced to evaluate the relevance between genes and related decision in this work. Then the measure is combined with the search strategy of minimal redundancy and maximal relevancy (mRMR) for constructing a NMI based mRMR gene selection algorithm (NMI_mRMR). In addition, it is also found that the first k best genes with respect to NMI are usually enough for cancer classification. We can just perform mRMR on these genes and remove the rest in the preprocessing step, which will lead to reduction of computational time. Based on this observation, an efficient gene selection algorithm, denoted by NMI_EmRMR, is proposed. Several cancer recognition tasks are gathered for testing the proposed technique. The experimental results show NMI_EmRMR is effective and efficient.
引用
收藏
页码:63 / 74
页数:12
相关论文
共 39 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia
    Armstrong, SA
    Staunton, JE
    Silverman, LB
    Pieters, R
    de Boer, ML
    Minden, MD
    Sallan, SE
    Lander, ES
    Golub, TR
    Korsmeyer, SJ
    [J]. NATURE GENETICS, 2002, 30 (01) : 41 - 47
  • [3] Gene identification: Classical and computational intelligence approaches
    Bandyopadhyay, Sanghantitra
    Maulik, Ujjwal
    Roy, Debadyuti
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (01): : 55 - 68
  • [4] USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING
    BATTITI, R
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04): : 537 - 550
  • [5] Gene-expression profiles predict survival of patients with lung adenocarcinoma
    Beer, DG
    Kardia, SLR
    Huang, CC
    Giordano, TJ
    Levin, AM
    Misek, DE
    Lin, L
    Chen, GA
    Gharib, TG
    Thomas, DG
    Lizyness, ML
    Kuick, R
    Hayasaka, S
    Taylor, JMG
    Iannettoni, MD
    Orringer, MB
    Hanash, S
    [J]. NATURE MEDICINE, 2002, 8 (08) : 816 - 824
  • [6] Accessing genetic information with high-density DNA arrays
    Chee, M
    Yang, R
    Hubbell, E
    Berno, A
    Huang, XC
    Stern, D
    Winkler, J
    Lockhart, DJ
    Morris, MS
    Fodor, SPA
    [J]. SCIENCE, 1996, 274 (5287) : 610 - 614
  • [7] A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue
    Chen, Zhenyu
    Li, Jianping
    Wei, Liwei
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2007, 41 (02) : 161 - 175
  • [8] A new feature selection scheme using a data distribution factor for unsupervised nominal data
    Chow, Tommy W. S.
    Wang, Piyang
    Ma, Eden W. M.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (02): : 499 - 509
  • [9] DeRisi J, 1996, NAT GENET, V14, P457
  • [10] Minimum redundancy feature selection from microarray gene expression data
    Ding, C
    Peng, HC
    [J]. PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 523 - 528