A Clustering Approach for Feature Selection in Microarray Data Classification Using Random forest

被引:31
作者
Aydadenta, Husna [1 ]
Adiwijaya [1 ]
机构
[1] Telkom Univ, Sch Comp, Bandung, Indonesia
来源
JOURNAL OF INFORMATION PROCESSING SYSTEMS | 2018年 / 14卷 / 05期
关键词
Classification; Clustering; Dimensional Reduction; Microarray; Random Forest;
D O I
10.3745/JIPS.04.0087
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.
引用
收藏
页码:1167 / 1175
页数:9
相关论文
共 13 条
[1]  
American Cancer Society, 2015, CANC FACTS FIG 2015
[2]  
Ammu P. K, 2013, INT J COMPUTER APPL, V61, P39
[3]  
[Anonymous], 2016, INT J ADV INTELLIGEN
[4]   On the classification techniques in data mining for microarray data classification [J].
Aydadenta, Husna ;
Adiwijaya .
INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE (ICODIS), 2018, 971
[5]  
Biesiada J., 2015, P INT C RES EL APPL
[6]   Algorithms for overcoming the curse of dimensionality for certain Hamilton-Jacobi equations arising in control theory and elsewhere [J].
Darbon, Jerome ;
Osher, Stanley .
RESEARCH IN THE MATHEMATICAL SCIENCES, 2016, 3
[7]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[8]  
Hira Zena M., 2015, Advances in Bioinformatics, V2015, P198363, DOI 10.1155/2015/198363
[9]  
KIRA K, 1992, AAAI-92 PROCEEDINGS : TENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, P129
[10]  
Moorthy K, 2012, COMM COM INF SC, V295, P174