IDENTIFICATION OF SIGNIFICANT FEATURES USING RANDOM FOREST FOR HIGH DIMENSIONAL MICROARRAY DATA

被引:0
作者
Nagpal, Arpita [1 ]
Singh, Vijendra [1 ]
机构
[1] NorthCap Univ, Sch Engn & Technol, Dept Comp Sci & Engn, Gurugram, Haryana, India
来源
JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY | 2018年 / 13卷 / 08期
关键词
Classification; Feature selection; High dimensional data; Microarray data; Random forest;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Feature subset selection for microarray data aims at reducing the number of genes so that useful information can be extracted from the samples. At the same time, selecting the relevant genes (features) from the high dimensional data can improve the classification accuracy of the learning algorithm. This paper proposes a feature selection algorithm, which is fit for high dimensional and small sample size microarray data. Feature selection is performed in two phases. In the first phase, Random Forest is used to identifying the importance of each feature, so that the features with high relevance can be given priority over less relevant ones. In the second phase, feature clustering is performed around the relevant features to yield the reduced feature set. A statistical method is used to create the clusters that aid in giving the genes specifically representing the disease. The effectiveness of the proposed algorithm has been compared with three state-of-the-art feature selection algorithms viz. Fast-Correlation Based Filter (FCBF), a Fast Clustering-Based Feature Selection Algorithm (FAST) and Random Forest (RF) on nine real-world cancer microarray datasets. Empirically, the algorithms have been evaluated through three well-known classifiers viz. probability based Naive Bayes, Tree-based C4.5, and the Instance-based IB1. The stated result shows that the proposed algorithm can be helpful in finding the smaller set of features for cancer microarray datasets with better classification accuracy.
引用
收藏
页码:2446 / 2463
页数:18
相关论文
共 50 条
  • [21] Identification of Major Depressive Disorder: Using Significant Features of EEG Signals Obtained by Random Forest and Ant Colony Optimization METHODS
    Bandopadhyay, Saikat
    Nag, Srijan
    Saha, Sujay
    Ghosh, Anupam
    2020 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2020), 2020, : 65 - 70
  • [22] CLASSIFICATION OF LARGE MICROARRAY DATASETS USING FAST RANDOM FOREST CONSTRUCTION
    Manilich, Elena A.
    Oezsoyoglu, Z. Meral
    Trubachev, Valeriy
    Radivoyevitch, Tomas
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2011, 9 (02) : 251 - 267
  • [23] Gene selection using random forest and proximity differences criterion on DNA microarray data
    Zhou Q.
    Hong W.
    Luo L.
    Yang F.
    Journal of Convergence Information Technology, 2010, 5 (06) : 17
  • [24] Effective Prostate Cancer Detection using Enhanced Particle Swarm Optimization Algorithm with Random Forest on the Microarray Data
    Kaulgud, Sanjeev Prakashrao
    Hulipalled, Vishwanath
    Patil, Siddanagouda Somanagouda
    Metipatil, Prabhuraj
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (03) : 251 - 258
  • [25] Research of Medical High-dimensional Imbalanced Data Classification-Ensemble Feature Selection Algorithm with Random Forest
    Zhu, Min
    Su, Bo
    Ning, Gangmin
    2017 INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA), 2017, : 273 - 277
  • [26] Random forest Granger causality for detection of effective brain connectivity using high-dimensional data
    Furqan, Mohammad Shaheryar
    Siyal, Mohammad Yakoob
    JOURNAL OF INTEGRATIVE NEUROSCIENCE, 2016, 15 (01) : 55 - 66
  • [27] Identification and Mapping of Eucalyptus Plantations in Remote Sensing Data Using CCDC Algorithm and Random Forest
    Zhou, Miaohang
    Han, Xujun
    Wang, Jinghan
    Ji, Xiangyu
    Zhou, Yuefei
    Liu, Meng
    FORESTS, 2024, 15 (11):
  • [28] Prediction of Alzheimer?s Using Random Forest with Radiomic Features
    Singh, Anuj
    Kumar, Raman
    Tiwari, Arvind Kumar
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 45 (01): : 513 - 530
  • [29] Bayesian weighted random forest for classification of high-dimensional genomics data
    Olaniran, Oyebayo Ridwan
    Abdullah, Mohd Asrul A.
    KUWAIT JOURNAL OF SCIENCE, 2023, 50 (04) : 477 - 484
  • [30] Laplacian-Weighted Random Forest for High-Dimensional Data Classification
    Liang, Jianheng
    Huang, Dong
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 748 - 753