The penalized biclustering model and related algorithms

被引:11
作者
Chekouo, Thierry [1 ]
Murua, Alejandro [1 ]
机构
[1] Univ Montreal, Dept Math & Stat, Montreal, PQ H3C 3J7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
model selection; deviance information criterion; mixture; plaid model; gene expression; clustering; Y800; MICROARRAY DATA; GENE;
D O I
10.1080/02664763.2014.999647
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Biclustering is the simultaneous clustering of two related dimensions, for example, of individuals and features, or genes and experimental conditions. Very few statistical models for biclustering have been proposed in the literature. Instead, most of the research has focused on algorithms to find biclusters. The models underlying them have not received much attention. Hence, very little is known about the adequacy and limitations of the models and the efficiency of the algorithms. In this work, we shed light on associated statistical models behind the algorithms. This allows us to generalize most of the known popular biclustering techniques, and to justify, and many times improve on, the algorithms used to find the biclusters. It turns out that most of the known techniques have a hidden Bayesian flavor. Therefore, we adopt a Bayesian framework to model biclustering. We propose a measure of biclustering complexity (number of biclusters and overlapping) through a penalized plaid model, and present a suitable version of the deviance information criterion to choose the number of biclusters, a problem that has not been adequately addressed yet. Our ideas are motivated by the analysis of gene expression data.
引用
收藏
页码:1255 / 1277
页数:23
相关论文
共 30 条
  • [1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [2] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [3] BESAG J, 1986, J R STAT SOC B, V48, P259
  • [4] AN ANALYSIS OF TRANSFORMATIONS
    BOX, GEP
    COX, DR
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1964, 26 (02) : 211 - 252
  • [5] Biclustering in data mining
    Busygin, Stanislav
    Prokopyev, Oleg
    Pardalos, Panos M.
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (09) : 2964 - 2987
  • [6] Celeux G, 2006, BAYESIAN ANAL, V1, P651, DOI 10.1214/06-BA122
  • [7] The transcriptional program of sporulation in budding yeast
    Chu, S
    DeRisi, J
    Eisen, M
    Mulholland, J
    Botstein, D
    Brown, PO
    Herskowitz, I
    [J]. SCIENCE, 1998, 282 (5389) : 699 - 705
  • [8] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [9] Biclustering: Overcoming Data Dimensionality Problems in Market Segmentation
    Dolnicar, Sara
    Kaiser, Sebastian
    Lazarevski, Katie
    Leisch, Friedrich
    [J]. JOURNAL OF TRAVEL RESEARCH, 2012, 51 (01) : 41 - 49
  • [10] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868