Spike and slab biclustering

被引:7
作者
Denitto, M. [1 ]
Bicego, M. [1 ]
Farinelli, A. [1 ]
Figueiredo, M. A. T. [2 ,3 ]
机构
[1] Univ Verona, Str Le Grazie 15,Ca Vignal 2, Verona, Italy
[2] Univ Lisbon, Inst Telecomunicacoes, Ave Rovisco Pais 1, Lisbon, Portugal
[3] Univ Lisbon, Inst Super Tecn, Ave Rovisco Pais 1, Lisbon, Portugal
关键词
Biclustering; Spike and slab; Probabilistic graphical models; Expectation-maximization; GENE-EXPRESSION DATA; NONNEGATIVE MATRIX FACTORIZATION; EM ALGORITHM; VARIABLE SELECTION; MICROARRAY DATA; SPARSE; DECOMPOSITION; MODELS;
D O I
10.1016/j.patcog.2017.07.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biclustering refers to the problem of simultaneously clustering the rows and columns of a given data matrix, with the goal of obtaining submatrices where the selected rows present a coherent behaviour in the selected columns, and vice-versa. To face this intrinsically difficult problem, we propose a novel generative model, where biclustering is approached from a sparse low-rank matrix factorization perspective. The main idea is to design a probabilistic model describing the factorization of a given data matrix in two other matrices, from which information about rows and columns belonging to the sought for biclusters can be obtained. One crucial ingredient in the proposed model is the use of a spike and slab sparsity inducing prior, thus we term the approach spike and slab biclustering (SSBi). To estimate the parameters of the SSBi model, we propose an expectation-maximization (EM) algorithm, termed SSBiEM, which solves a low-rank factorization problem at each iteration, using a recently proposed augmented Lagrangian algorithm. Experiments with both synthetic and real data show that the SSBi approach compares favorably with the state-of-the-art. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:186 / 195
页数:10
相关论文
共 61 条
  • [1] Ailem M., 2017, IEEE T KNOWL DATA EN
  • [2] [Anonymous], 2002, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, DOI DOI 10.1145/564691.564737
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] Investigating Topic Models' Capabilities in Expression Microarray Data Classification
    Bicego, Manuele
    Lovato, Pietro
    Perina, Alessandro
    Fasoli, Marianna
    Delledonne, Massimo
    Pezzotti, Mario
    Polverari, Annalisa
    Murino, Vittorio
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (06) : 1831 - 1836
  • [5] Sparse group factor analysis for biclustering of multiple data sources
    Bunte, Kerstin
    Leppaaho, Eemeli
    Saarinen, Inka
    Kaski, Samuel
    [J]. BIOINFORMATICS, 2016, 32 (16) : 2457 - 2463
  • [6] Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-rank Matrix Decomposition
    Cabral, Ricardo
    De la Torre, Fernando
    Costeira, Joao P.
    Bernardino, Alexandre
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2488 - 2495
  • [7] Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
  • [8] de Castro PAD, 2007, LECT NOTES COMPUT SC, V4628, P83
  • [9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [10] Multiple Structure Recovery via Probabilistic Biclustering
    Denitto, M.
    Magri, L.
    Farinelli, A.
    Fusiello, A.
    Bicego, M.
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2016, 2016, 10029 : 274 - 284