IDENTIFICATION OF SIGNIFICANT FEATURES USING RANDOM FOREST FOR HIGH DIMENSIONAL MICROARRAY DATA

被引：0

作者：

Nagpal, Arpita ^{[1
]}

Singh, Vijendra ^{[1
]}

机构：

[1] NorthCap Univ, Sch Engn & Technol, Dept Comp Sci & Engn, Gurugram, Haryana, India

来源：

JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY | 2018年 / 13卷 / 08期

关键词：

Classification; Feature selection; High dimensional data; Microarray data; Random forest;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Feature subset selection for microarray data aims at reducing the number of genes so that useful information can be extracted from the samples. At the same time, selecting the relevant genes (features) from the high dimensional data can improve the classification accuracy of the learning algorithm. This paper proposes a feature selection algorithm, which is fit for high dimensional and small sample size microarray data. Feature selection is performed in two phases. In the first phase, Random Forest is used to identifying the importance of each feature, so that the features with high relevance can be given priority over less relevant ones. In the second phase, feature clustering is performed around the relevant features to yield the reduced feature set. A statistical method is used to create the clusters that aid in giving the genes specifically representing the disease. The effectiveness of the proposed algorithm has been compared with three state-of-the-art feature selection algorithms viz. Fast-Correlation Based Filter (FCBF), a Fast Clustering-Based Feature Selection Algorithm (FAST) and Random Forest (RF) on nine real-world cancer microarray datasets. Empirically, the algorithms have been evaluated through three well-known classifiers viz. probability based Naive Bayes, Tree-based C4.5, and the Instance-based IB1. The stated result shows that the proposed algorithm can be helpful in finding the smaller set of features for cancer microarray datasets with better classification accuracy.

引用

页码：2446 / 2463

页数：18

共 50 条

[1] Identification of significant features in DNA microarray data
Bair, Eric
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (04): : 309 - 325
[2] Random Forest for Gene Selection and Microarray Data Classification
Moorthy, Kohbalan
Mohamad, Mohd Saberi
KNOWLEDGE TECHNOLOGY, 2012, 295 : 174 - 183
[3] Random forest for gene selection and microarray data classification
Moorthy, Kohbalan
Mohamad, Mohd Saberi
BIOINFORMATION, 2011, 7 (03) : 142 - 146
[4] Classification Application Based on Mutual Information and Random Forest Method for High Dimensional Data
Kong, Qingqing
Gong, Huili
Ding, Xiangqian
Hou, Ruichun
2017 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2017), VOL 1, 2017, : 171 - 174
[5] Using random forest similarities in unsupervised learning: Applications to microarray data
Shi, T
Horvath, S
PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 883 - 886
[6] A Clustering Approach for Feature Selection in Microarray Data Classification Using Random forest
Aydadenta, Husna
Adiwijaya
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (05): : 1167 - 1175
[7] Gene selection and classification of microarray data using random forest
Ramón Díaz-Uriarte
Sara Alvarez de Andrés
BMC Bioinformatics, 7
[8] An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data
Raj, D. M. Deepak
Mohanasundaram, R.
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 2619 - 2630
[9] Using recursive feature elimination in random forest to account for correlated variables in high dimensional data
Burcu F. Darst
Kristen C. Malecki
Corinne D. Engelman
BMC Genetics, 19
[10] Using recursive feature elimination in random forest to account for correlated variables in high dimensional data
Darst, Burcu F.
Malecki, Kristen C.
Engelman, Corinne D.
BMC GENETICS, 2018, 19

← 1 2 3 4 5 →