A new clustering method of gene expression data based on multivariate Gaussian mixture models

被引:16
作者
Liu, Zhe [1 ,2 ]
Song, Yu-qing [1 ]
Xie, Cong-hua [3 ]
Tang, Zheng [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Telecommun, Room 522, Zhenjiang, Jiangsu, Peoples R China
[2] Jilin Nomal Univ, Sch Comp Sci, Sipin, Jilin Province, Peoples R China
[3] Changshu Inst Technol, Sch Comp Sci & Engn, Suzhou, Jiangsu, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
Gene expression data; Clustering; Multivariate Gaussian mixture models; Expectation maximization; QAIC criterion; K-MEANS; ALGORITHM;
D O I
10.1007/s11760-015-0749-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering gene expression data are an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Many clustering methods have been used in the field of gene clustering. This paper proposed a new method for gene expression data clustering based on an improved expectation maximization(EM) method of multivariate Gaussian mixture models. To solve the problem of over-reliance on the initialization, we propose a remove and add initialization for the classical EM, and make a random perturbation on the solution before continuing EM iterations. The number of clusters is estimated with the Quasi Akaike's information criterion in this paper. The improved EM method is tested and compared with some other clustering methods; the performance of our clustering algorithm has been extensively compared over several simulated and real gene expression data sets. Our results indicated that improved EM clustering method is superior than other clustering algorithms and can be widely used for gene clustering.
引用
收藏
页码:359 / 368
页数:10
相关论文
共 34 条
[21]   Supervised cluster analysis for microarray data based on multivariate Gaussian mixture [J].
Qu, Y ;
Xu, SZ .
BIOINFORMATICS, 2004, 20 (12) :1905-1913
[22]   Improvement of new automatic differential fuzzy clustering using SVM classifier for microarray analysis [J].
Saha, Indrajit ;
Maulik, Ujjwal ;
Bandyopadhyay, Sanghamitra ;
Plewczynski, Dariusz .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) :15122-15133
[23]   ESTIMATING DIMENSION OF A MODEL [J].
SCHWARZ, G .
ANNALS OF STATISTICS, 1978, 6 (02) :461-464
[24]   An optimal hierarchical clustering algorithm for gene expression data [J].
Seal, S ;
Komarina, S ;
Aluru, S .
INFORMATION PROCESSING LETTERS, 2005, 93 (03) :143-147
[25]   Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization [J].
Sun, Jun ;
Chen, Wei ;
Fang, Wei ;
Wun, Xiaojun ;
Xu, Wenbo .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (02) :376-391
[26]   Gene expression data clustering and visualization based on a binary hierarchical clustering framework [J].
Szeto, LK ;
Liew, AWC ;
Yan, H ;
Tang, SS .
JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2003, 14 (04) :341-362
[27]   Systematic determination of genetic network architecture [J].
Tavazoie, S ;
Hughes, JD ;
Campbell, MJ ;
Cho, RJ ;
Church, GM .
NATURE GENETICS, 1999, 22 (03) :281-285
[28]  
Weizmann Institute of Science, 1996, GENECARDS HUM GEN CO
[29]   Large-scale temporal gene expression mapping of central nervous system development [J].
Wen, XL ;
Fuhrman, S ;
Michaels, GS ;
Carr, DB ;
Smith, S ;
Barker, JL ;
Somogyi, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (01) :334-339
[30]   A robust EM clustering algorithm for Gaussian mixture models [J].
Yang, Miin-Shen ;
Lai, Chien-Yo ;
Lin, Chih-Ying .
PATTERN RECOGNITION, 2012, 45 (11) :3950-3961