K-Means Clustering with Infinite Feature Selection for Classification Tasks in Gene Expression Data

被引:8
作者
Remli, Muhammad Akmal [1 ]
Daud, Kauthar Mohd [1 ]
Nies, Hui Wen [1 ]
Mohamad, Mohd Saberi [1 ]
Deris, Safaai [2 ]
Omatu, Sigeru [3 ]
Kasim, Shahreen [4 ]
Sulong, Ghazali [5 ]
机构
[1] Univ Teknol Malaysia, Fac Comp, Artificial Intelligence & Bioinformat Res Grp, Skudai 81310, Johor, Malaysia
[2] Univ Malaysia Kelantan, Fac Creat Technol & Heritage, Locked Bag 01, Kota Baharu 16300, Kelantan, Malaysia
[3] Osaka Inst Technol, Dept Elect Informat & Commun Engn, Osaka 5358585, Japan
[4] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Batu Paha 86400, Malaysia
[5] Univ Malaysia Terengganu, Sch Informat & Appl Math, Kuala Nerus 21030, Terengganu, Malaysia
来源
11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS | 2017年 / 616卷
关键词
Gene expression data; K-means clustering; Infinite feature selection; Cancer classification; Small round blue cell tumors; Informative genes; Artificial intelligence; MICROARRAY DATA; DIAGNOSIS; SUPPORT; HYBRID;
D O I
10.1007/978-3-319-60816-7_7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the bioinformatics and clinical research areas, microarray technology has been widely used to distinguish a cancer dataset between normal and tumour samples. However, the high dimensionality of gene expression data affects the classification accuracy of an experiment. Thus, feature selection is needed to select informative genes and remove non-informative genes. Some of the feature selection methods, yet, ignore the interaction between genes. Therefore, the similar genes are clustered together and dissimilar genes are clustered in other groups. Hence, to provide a higher classification accuracy, this research proposed k-means clustering and infinite feature selection for identifying informative genes in the selected subset. This research has been applied to colorectal cancer and small round blue cell tumors datasets. Eventually, this research successfully obtained higher classification accuracy than the previous work.
引用
收藏
页码:50 / 57
页数:8
相关论文
共 22 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], ADCAIJ ADV DISTRIB C
[3]   Attribute clustering for grouping, selection, and classification of gene expression data [J].
Au, WH ;
Chan, KCC ;
Wong, AKC ;
Wang, Y .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (02) :83-101
[4]   A new clustering algorithm applying a hierarchical method neural network [J].
Bajo, Javier ;
De Paz, Juan F. ;
Rodriguez, Sara ;
Gonzalez, Angelica .
LOGIC JOURNAL OF THE IGPL, 2011, 19 (02) :304-314
[5]   A review of microarray datasets and applied feature selection methods [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. ;
Benitez, J. M. ;
Herrera, F. .
INFORMATION SCIENCES, 2014, 282 :111-135
[6]  
CebecI Z., 2015, Journal of Agricultural Informatics, V6, P13
[7]   Model of experts for decision support in the diagnosis of leukemia patients [J].
Corchado, Juan M. ;
De Paz, Juan F. ;
Rodriguez, Sara ;
Bajo, Javier .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2009, 46 (03) :179-200
[8]   MicroCBR: A case-based reasoning architecture for the classification of microarray data [J].
De Paz, Juan F. ;
Bajo, Javier ;
Vera, Vicente ;
Corchado, Juan M. .
APPLIED SOFT COMPUTING, 2011, 11 (08) :4496-4507
[9]   Differential Expression Analysis for Pathways [J].
Haynes, Winston A. ;
Higdon, Roger ;
Stanberry, Larissa ;
Collins, Dwayne ;
Kolker, Eugene .
PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (03)
[10]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323