An efficient statistical feature selection approach for classification of gene expression data

被引:110
|
作者
Chandra, B. [1 ]
Gupta, Manish [1 ]
机构
[1] Indian Inst Technol Delhi, Dept Math, New Delhi 110016, India
关键词
Cancer diagnosis and prediction; Gene selection; Classification; Feature selection; CANCER CLASSIFICATION; T-TEST; PREDICTION; TUMOR;
D O I
10.1016/j.jbi.2011.01.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Classification of gene expression data plays a significant role in prediction and diagnosis of diseases. Gene expression data has a special characteristic that there is a mismatch in gene dimension as opposed to sample dimension. All genes do not contribute for efficient classification of samples. A robust feature selection algorithm is required to identify the important genes which help in classifying the samples efficiently. In order to select informative genes (features) based on relevance and redundancy characteristics, many feature selection algorithms have been introduced in the past. Most of the earlier algorithms require computationally expensive search strategy to find an optimal feature subset. Existing feature selection methods are also sensitive to the evaluation measures. The paper introduces a novel and efficient feature selection approach based on statistically defined effective range of features for every class termed as ERGS (Effective Range based Gene Selection). The basic principle behind ERGS is that higher weight is given to the feature that discriminates the classes clearly. Experimental results on well-known gene expression datasets illustrate the effectiveness of the proposed approach. Two popular classifiers viz. Nave Bayes Classifier (NBC) and Support Vector Machine (SVM) have been used for classification. The proposed feature selection algorithm can be helpful in ranking the genes and also is capable of identifying the most relevant genes responsible for diseases like leukemia, colon tumor, lung cancer, diffuse large B-cell lymphoma (DLBCL), prostate cancer. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:529 / 535
页数:7
相关论文
共 50 条
  • [41] Data mining for feature selection in gene expression autism data
    Latkowski, Tomasz
    Osowski, Stanislaw
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (02) : 864 - 872
  • [42] Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
    Iqbal, Muhammad Javed
    Faye, Ibrahima
    Samir, Brahim Belhaouari
    Said, Abas Md
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [43] Naïve bayes text classification with statistical data feature selection
    Janaki Meena, M.
    Chandran, K.R.
    Advances in Modelling and Analysis B, 2009, 52 (1-2): : 83 - 99
  • [44] Feature selection and classification approaches in gene expression of breast cancer
    Ghosh, Sarada
    Samanta, Guruprasad
    De la Sen, Manuel
    AIMS BIOPHYSICS, 2021, 8 (04): : 372 - 384
  • [45] Informative Feature Clustering and Selection for Gene Expression Data
    Yang, Yuqi
    Yin, Pengshuai
    Luo, Zhihang
    Gu, Wenwen
    Chen, Renjie
    Wu, Qingyao
    IEEE ACCESS, 2019, 7 : 169174 - 169184
  • [46] A Review on Feature Selection Techniques for Gene Expression Data
    Vanjimalar, S.
    Ramyachitra, D.
    Manikandan, P.
    2018 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC 2018), 2018, : 26 - 29
  • [47] Evolutionary rough feature selection in gene expression data
    Banerjee, Mohua
    Mitra, Sushmita
    Banka, Haider
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2007, 37 (04): : 622 - 632
  • [48] The painter's feature selection for gene expression data
    Apiletti, Daniele
    Baralis, Elena
    Bruno, Giulia
    Fiori, Alessandro
    2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 4227 - 4230
  • [49] Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification
    Xu, Jiucheng
    Qu, Kanglin
    Qu, Kangjian
    Hou, Qincheng
    Meng, Xiangru
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (12) : 4011 - 4028
  • [50] Variance-based Feature Selection for Classification of Cancer Subtypes Using Gene Expression Data
    Roberts, Aedan G. K.
    Catchpoole, Daniel R.
    Kennedy, Paul J.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,