The penalized biclustering model and related algorithms

被引:11
作者
Chekouo, Thierry [1 ]
Murua, Alejandro [1 ]
机构
[1] Univ Montreal, Dept Math & Stat, Montreal, PQ H3C 3J7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
model selection; deviance information criterion; mixture; plaid model; gene expression; clustering; Y800; MICROARRAY DATA; GENE;
D O I
10.1080/02664763.2014.999647
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Biclustering is the simultaneous clustering of two related dimensions, for example, of individuals and features, or genes and experimental conditions. Very few statistical models for biclustering have been proposed in the literature. Instead, most of the research has focused on algorithms to find biclusters. The models underlying them have not received much attention. Hence, very little is known about the adequacy and limitations of the models and the efficiency of the algorithms. In this work, we shed light on associated statistical models behind the algorithms. This allows us to generalize most of the known popular biclustering techniques, and to justify, and many times improve on, the algorithms used to find the biclusters. It turns out that most of the known techniques have a hidden Bayesian flavor. Therefore, we adopt a Bayesian framework to model biclustering. We propose a measure of biclustering complexity (number of biclusters and overlapping) through a penalized plaid model, and present a suitable version of the deviance information criterion to choose the number of biclusters, a problem that has not been adequately addressed yet. Our ideas are motivated by the analysis of gene expression data.
引用
收藏
页码:1255 / 1277
页数:23
相关论文
共 30 条
  • [11] Using GOstats to test gene lists for GO term association
    Falcon, S.
    Gentleman, R.
    [J]. BIOINFORMATICS, 2007, 23 (02) : 257 - 258
  • [12] DIRECT CLUSTERING OF A DATA MATRIX
    HARTIGAN, JA
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1972, 67 (337) : 123 - &
  • [13] FABIA: factor analysis for bicluster acquisition
    Hochreiter, Sepp
    Bodenhofer, Ulrich
    Heusel, Martin
    Mayr, Andreas
    Mitterecker, Andreas
    Kasim, Adetayo
    Khamiakova, Tatsiana
    Van Sanden, Suzy
    Lin, Dan
    Talloen, Willem
    Bijnens, Luc
    Gohlmann, Hinrich W. H.
    Shkedy, Ziv
    Clevert, Djork-Arne
    [J]. BIOINFORMATICS, 2010, 26 (12) : 1520 - 1527
  • [14] Statistical estimation of cluster boundaries in gene expressions profile data
    Horimoto, K
    Toh, H
    [J]. BIOINFORMATICS, 2001, 17 (12) : 1143 - 1151
  • [15] HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM
    JORDAN, MI
    JACOBS, RA
    [J]. NEURAL COMPUTATION, 1994, 6 (02) : 181 - 214
  • [16] Bayesian Functional ANOVA Modeling Using Gaussian Process Prior Distributions
    Kaufman, Cari G.
    Sain, Stephan R.
    [J]. BAYESIAN ANALYSIS, 2010, 5 (01): : 123 - 149
  • [17] Spectral biclustering of microarray data: Coclustering genes and conditions
    Kluger, Y
    Basri, R
    Chang, JT
    Gerstein, M
    [J]. GENOME RESEARCH, 2003, 13 (04) : 703 - 716
  • [18] Lazzeroni L, 2002, STAT SINICA, V12, P61
  • [19] LINDLEY DV, 1972, J ROY STAT SOC B, V34, P1
  • [20] Biclustering algorithms for biological data analysis: A survey
    Madeira, SC
    Oliveira, AL
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2004, 1 (01) : 24 - 45