A family of mixture models for biclustering

被引:0
作者
Tu, Wangshu [1 ]
Subedi, Sanjeena [1 ]
机构
[1] Carleton Univ, Sch Math & Stat, 4302 Herzberg Labs,1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada
关键词
AECM; biclustering; factor analysis; mixture models; model-based clustering; GENE-EXPRESSION DATA; FACTOR ANALYZERS; CLUSTER-ANALYSIS; CLASSIFICATION; LIKELIHOOD; ALGORITHM; DIMENSION;
D O I
10.1002/sam.11555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known a priori. It is being increasingly used in bioinformatics, text analytics, and so on. Previously, biclustering has been introduced in a model-based clustering framework by utilizing a structure similar to a mixture of factor analyzers. In such models, observed variables X are modeled using a latent variable U that is assumed to be from N(0, I). Clustering of variables are introduced by imposing constraints on the entries of the factor loading matrix to be 0 and 1 that results in block diagonal covariance matrices. However, this approach is overly restrictive as off-diagonal elements in the blocks of the covariance matrices can only be 1 which can lead to unsatisfactory model fit on complex data. Here, the latent variable U is assumed to be from a N(0, T) where T is a diagonal matrix. This ensures that the off-diagonal terms in the block matrices within the covariance matrices are non-zero and not restricted to be 1. This leads to a superior model fit on complex data. A family of models is developed by imposing constraints on the components of the covariance matrix. For parameter estimation, an alternating expectation conditional maximization (AECM) algorithm is used. Finally, the proposed method is illustrated using simulated and real datasets.
引用
收藏
页码:206 / 224
页数:19
相关论文
共 50 条
[11]   A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting [J].
Subedi, Sanjeena ;
McNicholas, Paul D. .
JOURNAL OF CLASSIFICATION, 2021, 38 (01) :89-108
[12]   Variational algorithms for biclustering models [J].
Duy Vu ;
Aitkin, Murray .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 89 :12-24
[13]   Parsimonious skew mixture models for model-based clustering and classification [J].
Vrbik, Irene ;
McNicholas, Paul D. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :196-210
[14]   Biclustering via Mixtures of Regression Models [J].
Velu, Raja ;
Zhou, Zhaoque ;
Tee, Chyng Wen .
COMPUTATIONAL SCIENCE - ICCS 2019, PT II, 2019, 11537 :533-549
[15]   Biclustering models for structured microarray data [J].
Turner, HL ;
Bailey, TC ;
Krzanowski, WJ ;
Hemingway, CA .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (04) :316-329
[16]   Boolean Representation for Exact Biclustering [J].
Michalak, Marcin ;
Slezak, Dominik .
FUNDAMENTA INFORMATICAE, 2018, 161 (03) :275-297
[17]   Biclustering Analysis for Pattern Discovery: Current Techniques, Comparative Studies and Applications [J].
Zhao, Hongya ;
Liew, Alan Wee-Chung ;
Wang, Doris Z. ;
Yan, Hong .
CURRENT BIOINFORMATICS, 2012, 7 (01) :43-55
[18]   A biclustering approach for classification with mislabeled data [J].
de Franca, Fabricio O. ;
Coelho, Andre L. V. .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (12) :5065-5075
[19]   Finite Mixture Models [J].
McLachlan, Geoffrey J. ;
Lee, Sharon X. ;
Rathnayake, Suren I. .
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 6, 2019, 6 :355-378
[20]   Orthogonal Stiefel manifold optimization for eigen-decomposed covariance parameter estimation in mixture models [J].
Browne, Ryan P. ;
McNicholas, Paul D. .
STATISTICS AND COMPUTING, 2014, 24 (02) :203-210