A new clustering method of gene expression data based on multivariate Gaussian mixture models

被引:14
|
作者
Liu, Zhe [1 ,2 ]
Song, Yu-qing [1 ]
Xie, Cong-hua [3 ]
Tang, Zheng [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Telecommun, Room 522, Zhenjiang, Jiangsu, Peoples R China
[2] Jilin Nomal Univ, Sch Comp Sci, Sipin, Jilin Province, Peoples R China
[3] Changshu Inst Technol, Sch Comp Sci & Engn, Suzhou, Jiangsu, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
Gene expression data; Clustering; Multivariate Gaussian mixture models; Expectation maximization; QAIC criterion; K-MEANS; ALGORITHM;
D O I
10.1007/s11760-015-0749-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering gene expression data are an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Many clustering methods have been used in the field of gene clustering. This paper proposed a new method for gene expression data clustering based on an improved expectation maximization(EM) method of multivariate Gaussian mixture models. To solve the problem of over-reliance on the initialization, we propose a remove and add initialization for the classical EM, and make a random perturbation on the solution before continuing EM iterations. The number of clusters is estimated with the Quasi Akaike's information criterion in this paper. The improved EM method is tested and compared with some other clustering methods; the performance of our clustering algorithm has been extensively compared over several simulated and real gene expression data sets. Our results indicated that improved EM clustering method is superior than other clustering algorithms and can be widely used for gene clustering.
引用
收藏
页码:359 / 368
页数:10
相关论文
共 50 条
  • [1] A new clustering method of gene expression data based on multivariate Gaussian mixture models
    Zhe Liu
    Yu-qing Song
    Cong-hua Xie
    Zheng Tang
    Signal, Image and Video Processing, 2016, 10 : 359 - 368
  • [2] Clustering gene expression data analysis using an improved EM algorithm based on multivariate elliptical contoured mixture models
    Liu, Zhe
    Song, Yu-qing
    Xie, Cong-hua
    Zhu, Feng
    Bao, Xiang
    OPTIK, 2014, 125 (21): : 6388 - 6394
  • [3] Multivariate data clustering for the Gaussian mixture model
    Kavaliauskas, M
    Rudzkis, R
    INFORMATICA, 2005, 16 (01) : 61 - 74
  • [4] Model-based clustering of microarray expression data via latent Gaussian mixture models
    McNicholas, Paul D.
    Murphy, Thomas Brendan
    BIOINFORMATICS, 2010, 26 (21) : 2705 - 2712
  • [5] A new feature selection method for Gaussian mixture clustering
    Zeng, Hong
    Cheung, Yiu-Ming
    PATTERN RECOGNITION, 2009, 42 (02) : 243 - 250
  • [6] A kernel-based clustering method for gene selection with gene expression data
    Chen, Huihui
    Zhang, Yusen
    Gutman, Ivan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 62 : 12 - 20
  • [7] Unsupervised Gene Expression Data using Enhanced Clustering Method
    Chandrasekhar, T.
    Thangavel, K.
    Elayaraja, E.
    Sathishkumar, E. N.
    2013 IEEE INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN COMPUTING, COMMUNICATION AND NANOTECHNOLOGY (ICE-CCN'13), 2013, : 518 - 522
  • [8] Vine copula mixture models and clustering for non-Gaussian data
    Sahin, Ozge
    Czado, Claudia
    ECONOMETRICS AND STATISTICS, 2022, 22 : 136 - 158
  • [9] Incremental Learning of Multivariate Gaussian Mixture Models
    Engel, Paulo Martins
    Heinen, Milton Roberto
    ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2010, 2010, 6404 : 82 - 91
  • [10] Gaussian Scale Mixture Models for Robust Linear Multivariate Regression with Missing Data
    Ala-Luhtala, Juha
    Piche, Robert
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2016, 45 (03) : 791 - 813