A family of mixture models for biclustering

被引:0
作者
Tu, Wangshu [1 ]
Subedi, Sanjeena [1 ]
机构
[1] Carleton Univ, Sch Math & Stat, 4302 Herzberg Labs,1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada
来源
STATISTICAL ANALYSIS AND DATA MINING-AN ASA DATA SCIENCE JOURNAL | 2022年 / 15卷 / 02期
关键词
AECM; biclustering; factor analysis; mixture models; model-based clustering; GENE-EXPRESSION DATA; FACTOR ANALYZERS; CLUSTER-ANALYSIS; CLASSIFICATION; LIKELIHOOD; ALGORITHM; DIMENSION;
D O I
10.1002/sam.11555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known a priori. It is being increasingly used in bioinformatics, text analytics, and so on. Previously, biclustering has been introduced in a model-based clustering framework by utilizing a structure similar to a mixture of factor analyzers. In such models, observed variables X are modeled using a latent variable U that is assumed to be from N(0, I). Clustering of variables are introduced by imposing constraints on the entries of the factor loading matrix to be 0 and 1 that results in block diagonal covariance matrices. However, this approach is overly restrictive as off-diagonal elements in the blocks of the covariance matrices can only be 1 which can lead to unsatisfactory model fit on complex data. Here, the latent variable U is assumed to be from a N(0, T) where T is a diagonal matrix. This ensures that the off-diagonal terms in the block matrices within the covariance matrices are non-zero and not restricted to be 1. This leads to a superior model fit on complex data. A family of models is developed by imposing constraints on the components of the covariance matrix. For parameter estimation, an alternating expectation conditional maximization (AECM) algorithm is used. Finally, the proposed method is illustrated using simulated and real datasets.
引用
收藏
页码:206 / 224
页数:19
相关论文
共 50 条
[21]   Biclustering multivariate discrete longitudinal data [J].
Alfo, M. ;
Marino, M. F. ;
Martella, F. .
STATISTICS AND COMPUTING, 2024, 34 (01)
[22]   Robust Estimation of Unbalanced Mixture Models on Samples with Outliers [J].
Galimzianova, Alfiia ;
Pernus, Franjo ;
Likar, Bostjan ;
Spiclin, Ziga .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (11) :2273-2285
[23]   Capturing patterns via parsimonious t mixture models [J].
Lin, Tsung-I ;
McNicholas, Paul D. ;
Ho, Hsiu J. .
STATISTICS & PROBABILITY LETTERS, 2014, 88 :80-87
[24]   Clustering Spatial Data with a Mixture of Skewed Regression Models [J].
Lee, Junho ;
Gallaugher, Michael P. B. ;
Hering, Amanda S. .
TECHNOMETRICS, 2025, 67 (03) :505-515
[25]   Learning prognostic models using a mixture of biclustering and triclustering: Predicting the need for non-invasive ventilation in Amyotrophic Lateral Sclerosis [J].
Soares, Diogo F. ;
Henriques, Rui ;
Gromicho, Marta ;
de Carvalho, Mamede ;
Madeira, Sara C. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 134
[26]   Proximity Measures and Results Validation in Biclustering - A Survey [J].
Orzechowski, Patryk .
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2013, 7895 :206-217
[27]   Outcome-guided spike-and-slab Lasso Biclustering: A Novel Approach for Enhancing Biclustering Techniques for Gene Expression Analysis [J].
Vargas-Mieles, Luis A. ;
Kirk, Paul D. W. ;
Wallace, Chris .
STATISTICS AND COMPUTING, 2025, 35 (06)
[28]   MTBGD: Mutli Type Biclustering for Genomic Data Biclustering of Genomic Data [J].
Huda, Syeda Bintul ;
Noureen, Nighat .
2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, :1113-1119
[29]   On Evolutionary Algorithms for Biclustering of Gene Expression Data [J].
Carballido Jessica, A. ;
Gallo Cristian, A. ;
Dussaut Julieta, S. ;
Ignacio, Ponzoni .
CURRENT BIOINFORMATICS, 2015, 10 (03) :259-267
[30]   Nonparametric Mixture of Regression Models [J].
Huang, Mian ;
Li, Runze ;
Wang, Shaoli .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (503) :929-941