IDENTIFICATION OF SIGNIFICANT FEATURES USING RANDOM FOREST FOR HIGH DIMENSIONAL MICROARRAY DATA

被引:0
|
作者
Nagpal, Arpita [1 ]
Singh, Vijendra [1 ]
机构
[1] NorthCap Univ, Sch Engn & Technol, Dept Comp Sci & Engn, Gurugram, Haryana, India
来源
JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY | 2018年 / 13卷 / 08期
关键词
Classification; Feature selection; High dimensional data; Microarray data; Random forest;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Feature subset selection for microarray data aims at reducing the number of genes so that useful information can be extracted from the samples. At the same time, selecting the relevant genes (features) from the high dimensional data can improve the classification accuracy of the learning algorithm. This paper proposes a feature selection algorithm, which is fit for high dimensional and small sample size microarray data. Feature selection is performed in two phases. In the first phase, Random Forest is used to identifying the importance of each feature, so that the features with high relevance can be given priority over less relevant ones. In the second phase, feature clustering is performed around the relevant features to yield the reduced feature set. A statistical method is used to create the clusters that aid in giving the genes specifically representing the disease. The effectiveness of the proposed algorithm has been compared with three state-of-the-art feature selection algorithms viz. Fast-Correlation Based Filter (FCBF), a Fast Clustering-Based Feature Selection Algorithm (FAST) and Random Forest (RF) on nine real-world cancer microarray datasets. Empirically, the algorithms have been evaluated through three well-known classifiers viz. probability based Naive Bayes, Tree-based C4.5, and the Instance-based IB1. The stated result shows that the proposed algorithm can be helpful in finding the smaller set of features for cancer microarray datasets with better classification accuracy.
引用
收藏
页码:2446 / 2463
页数:18
相关论文
共 50 条
  • [1] Identification of significant features in DNA microarray data
    Bair, Eric
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (04): : 309 - 325
  • [2] Random Forest for Gene Selection and Microarray Data Classification
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    KNOWLEDGE TECHNOLOGY, 2012, 295 : 174 - 183
  • [3] Random forest for gene selection and microarray data classification
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    BIOINFORMATION, 2011, 7 (03) : 142 - 146
  • [4] Classification Application Based on Mutual Information and Random Forest Method for High Dimensional Data
    Kong, Qingqing
    Gong, Huili
    Ding, Xiangqian
    Hou, Ruichun
    2017 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2017), VOL 1, 2017, : 171 - 174
  • [5] Using random forest similarities in unsupervised learning: Applications to microarray data
    Shi, T
    Horvath, S
    PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 883 - 886
  • [6] A Clustering Approach for Feature Selection in Microarray Data Classification Using Random forest
    Aydadenta, Husna
    Adiwijaya
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (05): : 1167 - 1175
  • [7] Gene selection and classification of microarray data using random forest
    Ramón Díaz-Uriarte
    Sara Alvarez de Andrés
    BMC Bioinformatics, 7
  • [8] An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data
    Raj, D. M. Deepak
    Mohanasundaram, R.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 2619 - 2630
  • [9] Using recursive feature elimination in random forest to account for correlated variables in high dimensional data
    Burcu F. Darst
    Kristen C. Malecki
    Corinne D. Engelman
    BMC Genetics, 19
  • [10] Using recursive feature elimination in random forest to account for correlated variables in high dimensional data
    Darst, Burcu F.
    Malecki, Kristen C.
    Engelman, Corinne D.
    BMC GENETICS, 2018, 19