The penalized biclustering model and related algorithms

被引：11

作者：

Chekouo, Thierry ^{[1
]}

Murua, Alejandro ^{[1
]}

机构：

[1] Univ Montreal, Dept Math & Stat, Montreal, PQ H3C 3J7, Canada

来源：

JOURNAL OF APPLIED STATISTICS | 2015年 / 42卷 / 06期

基金：

加拿大自然科学与工程研究理事会;

关键词：

model selection; deviance information criterion; mixture; plaid model; gene expression; clustering; Y800; MICROARRAY DATA; GENE;

D O I：

10.1080/02664763.2014.999647

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Biclustering is the simultaneous clustering of two related dimensions, for example, of individuals and features, or genes and experimental conditions. Very few statistical models for biclustering have been proposed in the literature. Instead, most of the research has focused on algorithms to find biclusters. The models underlying them have not received much attention. Hence, very little is known about the adequacy and limitations of the models and the efficiency of the algorithms. In this work, we shed light on associated statistical models behind the algorithms. This allows us to generalize most of the known popular biclustering techniques, and to justify, and many times improve on, the algorithms used to find the biclusters. It turns out that most of the known techniques have a hidden Bayesian flavor. Therefore, we adopt a Bayesian framework to model biclustering. We propose a measure of biclustering complexity (number of biclusters and overlapping) through a penalized plaid model, and present a suitable version of the deviance information criterion to choose the number of biclusters, a problem that has not been adequately addressed yet. Our ideas are motivated by the analysis of gene expression data.

引用

页码：1255 / 1277

页数：23

共 30 条

[11] Using GOstats to test gene lists for GO term association
Falcon, S.
Gentleman, R.
[J]. BIOINFORMATICS, 2007, 23 (02) : 257 - 258
[12] DIRECT CLUSTERING OF A DATA MATRIX
HARTIGAN, JA
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1972, 67 (337) : 123 - &
[13] FABIA: factor analysis for bicluster acquisition
Hochreiter, Sepp
Bodenhofer, Ulrich
Heusel, Martin
Mayr, Andreas
Mitterecker, Andreas
Kasim, Adetayo
Khamiakova, Tatsiana
Van Sanden, Suzy
Lin, Dan
Talloen, Willem
Bijnens, Luc
Gohlmann, Hinrich W. H.
Shkedy, Ziv
Clevert, Djork-Arne
[J]. BIOINFORMATICS, 2010, 26 (12) : 1520 - 1527
[14] Statistical estimation of cluster boundaries in gene expressions profile data
Horimoto, K
Toh, H
[J]. BIOINFORMATICS, 2001, 17 (12) : 1143 - 1151
[15] HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM
JORDAN, MI
JACOBS, RA
[J]. NEURAL COMPUTATION, 1994, 6 (02) : 181 - 214
[16] Bayesian Functional ANOVA Modeling Using Gaussian Process Prior Distributions
Kaufman, Cari G.
Sain, Stephan R.
[J]. BAYESIAN ANALYSIS, 2010, 5 (01): : 123 - 149
[17] Spectral biclustering of microarray data: Coclustering genes and conditions
Kluger, Y
Basri, R
Chang, JT
Gerstein, M
[J]. GENOME RESEARCH, 2003, 13 (04) : 703 - 716
[18] Lazzeroni L, 2002, STAT SINICA, V12, P61
[19] LINDLEY DV, 1972, J ROY STAT SOC B, V34, P1
[20] Biclustering algorithms for biological data analysis: A survey
Madeira, SC
Oliveira, AL
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2004, 1 (01) : 24 - 45

← 1 2 3 →