Textual data summarization using the Self-Organized Co-Clustering model

被引:12
作者
Selosse, Margot [1 ]
Jacques, Julien [1 ]
Biernacki, Christophe [2 ,3 ]
机构
[1] Univ Lyon, Lyon & ERIC EA3083 2, 5 Ave Pierre Mendes, Bron 69500, France
[2] Univ Lille, UFR Math, Cite Sci, Villeneuve Dascq 59655, France
[3] INRIA, 40 Av Halley,Bat A,Pk Plaza, Villeneuve Dascq 59650, France
关键词
Co-Clustering; Document-term matrix; Latent block model; LATENT BLOCK MODEL; FACTORIZATION; MATRIX;
D O I
10.1016/j.patcog.2020.107315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model's inference as well as a model selection criterion to choose the number of co-clusters. Both simulated and real data sets illustrate the efficiency of this model by its ability to easily identify relevant co-clusters. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 24 条
[21]   Co-clustering of evolving count matrices with the dynamic latent block model: application to pharmacovigilance [J].
Giulia Marchello ;
Audrey Fresse ;
Marco Corneli ;
Charles Bouveyron .
Statistics and Computing, 2022, 32
[22]   Co-clustering of evolving count matrices with the dynamic latent block model: application to pharmacovigilance [J].
Marchello, Giulia ;
Fresse, Audrey ;
Corneli, Marco ;
Bouveyron, Charles .
STATISTICS AND COMPUTING, 2022, 32 (03)
[23]   Developing a self-organized tubulogenesis model of human renal proximal tubular epithelial cells in vitro [J].
Wang, Xiuli ;
Guo, Chengchen ;
Chen, Ying ;
Tozzi, Lorenzo ;
Szymkowiak, Sophia ;
Li, Chunmei ;
Kaplan, David L. .
JOURNAL OF BIOMEDICAL MATERIALS RESEARCH PART A, 2020, 108 (03) :795-804
[24]   Self-Organized ECM-Mimetic Model Based on an Amphiphilic Multiblock Silk-Elastin-Like Corecombinamer with a Concomitant Dual Physical Gelation Process [J].
Fernandez-Colino, Alicia ;
Javier Arias, F. ;
Alonso, Matilde ;
Carlos Rodriguez-Cabello, J. .
BIOMACROMOLECULES, 2014, 15 (10) :3781-3793