A new clustering method of gene expression data based on multivariate Gaussian mixture models

被引:0
作者
Zhe Liu
Yu-qing Song
Cong-hua Xie
Zheng Tang
机构
[1] Jiangsu University,School of Computer Science and Telecommunication
[2] Jilin Nomal University,School of Computer Science
[3] Changshu Institute of Technology,School of Computer Science and Engineering
来源
Signal, Image and Video Processing | 2016年 / 10卷
关键词
Gene expression data; Clustering; Multivariate Gaussian mixture models; Expectation maximization; QAIC criterion;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering gene expression data are an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Many clustering methods have been used in the field of gene clustering. This paper proposed a new method for gene expression data clustering based on an improved expectation maximization(EM) method of multivariate Gaussian mixture models. To solve the problem of over-reliance on the initialization, we propose a remove and add initialization for the classical EM, and make a random perturbation on the solution before continuing EM iterations. The number of clusters is estimated with the Quasi Akaike’s information criterion in this paper. The improved EM method is tested and compared with some other clustering methods; the performance of our clustering algorithm has been extensively compared over several simulated and real gene expression data sets. Our results indicated that improved EM clustering method is superior than other clustering algorithms and can be widely used for gene clustering.
引用
收藏
页码:359 / 368
页数:9
相关论文
共 84 条
[1]  
Pirim H(2012)Clustering of high throughput gene expression data Comput. Op. Res. 39 3046-3061
[2]  
Ekşioğlu B(2012)Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization Eng. Appl. Artif. Intell. 25 376-391
[3]  
Perkins AD(2009)Towards improving fuzzy clustering using support vector machine: application to gene expression data Pattern Recognit. 42 2744-2763
[4]  
Yüceer Ç(2010)Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure Pattern Recognit. 43 3969-3976
[5]  
Sun J(2008)Techniques for clustering gene expression data Comput. Biol. Med. 38 283-293
[6]  
Chen W(2005)An optimal hierarchical clustering algorithm for gene expression data Inform. Process Lett. 93 143-147
[7]  
Fang W(2003)Gene expression data clustering and visualization based on a binary hierarchical clustering framework J. Visual. Lang. Comput. 14 341-362
[8]  
Wun XJ(2006)An efficient greedy K-means algorithm for global gene trajectory clustering Expert Syst. Appl. 30 137-141
[9]  
Xu WB(2012)Exploratory K-Means: a new simple and efficient algorithm for gene clustering Appl. Soft Comput. 12 1149-1157
[10]  
Mukhopadhyay A(2009)Application of Multi-SOM clustering approach to macrophage gene expression analysis Infect. Genet. Evol. 9 328-336