A Pseudo-EM Algorithm for Clustering Incomplete Longitudinal Data

被引:6
作者
Shaikh, Mateen [1 ]
McNicholas, Paul D. [1 ]
Desmond, Anthony F. [1 ]
机构
[1] Univ Guelph, Guelph, ON N1G 2W1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
clustering; gene expression time course data; longitudinal data; missing data; mixture models; pseudo-EM; MAXIMUM-LIKELIHOOD; MISSING VALUES; MODELS;
D O I
10.2202/1557-4679.1223
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A method for clustering incomplete longitudinal data, and gene expression time course data in particular, is presented. Specifically, an existing method that utilizes mixtures of multivariate Gaussian distributions with modified Cholesky-decomposed covariance structure is extended to accommodate incomplete data. Parameter estimation is carried out in a fashion that is similar to an expectation-maximization algorithm. We focus on the particular application of clustering incomplete gene expression time course data. In this application, our approach gives good clustering performance when compared to the results when there is no missing data. Possible extensions of this work are also suggested.
引用
收藏
页数:17
相关论文
共 25 条
[1]  
[Anonymous], 2008, EM ALGORITHM EXTENSI
[2]  
[Anonymous], 0511 TRIN COLL DEP S
[3]  
BEALE EML, 1975, J ROY STAT SOC B MET, V37, P129
[4]   THE DISTRIBUTION OF THE LIKELIHOOD RATIO FOR MIXTURES OF DENSITIES FROM THE ONE-PARAMETER EXPONENTIAL FAMILY [J].
BOHNING, D ;
DIETZ, E ;
SCHAUB, R ;
SCHLATTMANN, P ;
LINDSAY, BG .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1994, 46 (02) :373-388
[5]  
BUCK SF, 1960, J ROY STAT SOC B, V22, P302
[6]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705
[7]   Detecting features in spatial point processes with clutter via model-based clustering [J].
Dasgupta, A ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (441) :294-302
[8]   Model-based clustering for longitudinal data [J].
De la Cruz-Mesia, Rolando ;
Quintanab, Fernando A. ;
Marshall, Guillermo .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (03) :1441-1457
[9]   Using unlabelled data to update classification rules with applications in food authenticity studies [J].
Dean, N ;
Murphy, TB ;
Downey, G .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2006, 55 :1-14
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38