A new clustering method of gene expression data based on multivariate Gaussian mixture models

被引:16
作者
Liu, Zhe [1 ,2 ]
Song, Yu-qing [1 ]
Xie, Cong-hua [3 ]
Tang, Zheng [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Telecommun, Room 522, Zhenjiang, Jiangsu, Peoples R China
[2] Jilin Nomal Univ, Sch Comp Sci, Sipin, Jilin Province, Peoples R China
[3] Changshu Inst Technol, Sch Comp Sci & Engn, Suzhou, Jiangsu, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
Gene expression data; Clustering; Multivariate Gaussian mixture models; Expectation maximization; QAIC criterion; K-MEANS; ALGORITHM;
D O I
10.1007/s11760-015-0749-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering gene expression data are an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Many clustering methods have been used in the field of gene clustering. This paper proposed a new method for gene expression data clustering based on an improved expectation maximization(EM) method of multivariate Gaussian mixture models. To solve the problem of over-reliance on the initialization, we propose a remove and add initialization for the classical EM, and make a random perturbation on the solution before continuing EM iterations. The number of clusters is estimated with the Quasi Akaike's information criterion in this paper. The improved EM method is tested and compared with some other clustering methods; the performance of our clustering algorithm has been extensively compared over several simulated and real gene expression data sets. Our results indicated that improved EM clustering method is superior than other clustering algorithms and can be widely used for gene clustering.
引用
收藏
页码:359 / 368
页数:10
相关论文
共 34 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   High-dimensional data clustering [J].
Bouveyron, C. ;
Girard, S. ;
Schmid, C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519
[3]   GAUSSIAN PARSIMONIOUS CLUSTERING MODELS [J].
CELEUX, G ;
GOVAERT, G .
PATTERN RECOGNITION, 1995, 28 (05) :781-793
[4]   An efficient greedy K-means algorithm for global gene trajectory clustering [J].
Chan, ZSH ;
Collins, L ;
Kasabov, N .
EXPERT SYSTEMS WITH APPLICATIONS, 2006, 30 (01) :137-141
[5]   Fuzzy C-means method for clustering microarray data [J].
Dembélé, D ;
Kastner, P .
BIOINFORMATICS, 2003, 19 (08) :973-980
[6]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[7]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[8]   Application of Multi-SOM clustering approach to macrophage gene expression analysis [J].
Ghouila, Amel ;
Ben Yahia, Sadok ;
Malouche, Dhafer ;
Jmel, Haifa ;
Laouini, Dhafer ;
Guerfali, Fatma Z. ;
Abdelhak, Sonia .
INFECTION GENETICS AND EVOLUTION, 2009, 9 (03) :328-336
[9]   The transcriptional program in the response of human fibroblasts to serum [J].
Iyer, VR ;
Eisen, MB ;
Ross, DT ;
Schuler, G ;
Moore, T ;
Lee, JCF ;
Trent, JM ;
Staudt, LM ;
Hudson, J ;
Boguski, MS ;
Lashkari, D ;
Shalon, D ;
Botstein, D ;
Brown, PO .
SCIENCE, 1999, 283 (5398) :83-87
[10]   Model-based clustering for multivariate functional data [J].
Jacques, Julien ;
Preda, Cristian .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :92-106