An efficient statistical feature selection approach for classification of gene expression data

被引:110
|
作者
Chandra, B. [1 ]
Gupta, Manish [1 ]
机构
[1] Indian Inst Technol Delhi, Dept Math, New Delhi 110016, India
关键词
Cancer diagnosis and prediction; Gene selection; Classification; Feature selection; CANCER CLASSIFICATION; T-TEST; PREDICTION; TUMOR;
D O I
10.1016/j.jbi.2011.01.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Classification of gene expression data plays a significant role in prediction and diagnosis of diseases. Gene expression data has a special characteristic that there is a mismatch in gene dimension as opposed to sample dimension. All genes do not contribute for efficient classification of samples. A robust feature selection algorithm is required to identify the important genes which help in classifying the samples efficiently. In order to select informative genes (features) based on relevance and redundancy characteristics, many feature selection algorithms have been introduced in the past. Most of the earlier algorithms require computationally expensive search strategy to find an optimal feature subset. Existing feature selection methods are also sensitive to the evaluation measures. The paper introduces a novel and efficient feature selection approach based on statistically defined effective range of features for every class termed as ERGS (Effective Range based Gene Selection). The basic principle behind ERGS is that higher weight is given to the feature that discriminates the classes clearly. Experimental results on well-known gene expression datasets illustrate the effectiveness of the proposed approach. Two popular classifiers viz. Nave Bayes Classifier (NBC) and Support Vector Machine (SVM) have been used for classification. The proposed feature selection algorithm can be helpful in ranking the genes and also is capable of identifying the most relevant genes responsible for diseases like leukemia, colon tumor, lung cancer, diffuse large B-cell lymphoma (DLBCL), prostate cancer. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:529 / 535
页数:7
相关论文
共 50 条
  • [31] A Discriminative Feature Extraction Approach for Tumor Classification Using Gene Expression Data
    Mei, Qinglin
    Zhang, Huaxiang
    Liang, Cheng
    CURRENT BIOINFORMATICS, 2016, 11 (05) : 561 - 570
  • [32] A Statistical Approach to Set Classification by Feature Selection with Applications to Classification of Histopathology Images
    Jung, Sungkyu
    Qiao, Xingye
    BIOMETRICS, 2014, 70 (03) : 536 - 545
  • [33] Feature Selection for Cancer Classification on Microarray Expression Data
    Hsu, Hui-Huang
    Lu, Ming-Da
    ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 3, PROCEEDINGS, 2008, : 153 - 158
  • [34] A model for gene selection and classification of gene expression data
    Mohamad M.S.
    Omatu S.
    Deris S.
    Hashim S.Z.M.
    Artificial Life and Robotics, 2007, 11 (2) : 219 - 222
  • [35] A Wrapper Feature Selection Approach to Classification with Missing Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT I, 2016, 9597 : 685 - 700
  • [36] An online approach for feature selection for classification in big data
    Nazar, Nasrin Banu
    Senthilkumar, Radha
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (01) : 163 - 171
  • [37] A novel feature selection approach for biomedical data classification
    Peng, Yonghong
    Wu, Zhiqing
    Jiang, Jianmin
    JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (01) : 15 - 23
  • [38] Feature (gene) selection in gene expression-based tumor classification
    Xiong, MM
    Li, WJ
    Zhao, JY
    Jin, L
    Boerwinkle, E
    MOLECULAR GENETICS AND METABOLISM, 2001, 73 (03) : 239 - 247
  • [39] Feature selection and gene clustering from gene expression data
    Mitra, P
    Majumder, DD
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 343 - 346
  • [40] Statistical Class Prediction Method for Efficient Microarray Gene Expression Data Sample Classification
    Sheela, T.
    Rangarajan, Lalitha
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 73 - 78