The latent topic block model for the co-clustering of textual interaction data

被引:7
作者
Berge, Laurent R. [1 ,5 ]
Bouveyron, Charles [2 ,3 ]
Corneli, Marco [2 ,6 ]
Latouche, Pierre [1 ,4 ]
机构
[1] Univ Paris 05, Lab MAP5, UMR CNRS 8145, Paris, France
[2] Univ Cote dAzur, Lab JA Dieudonne, UMR CNRS 7351, Nice, France
[3] INRIA Sophia Antipolis, Epione, Valbonne, France
[4] Univ Paris 1 Pantheon Sorbonne, EA 4543, Lab SAMM, Paris, France
[5] Univ Luxembourg, 162a Ave Faiencerie, L-1511 Luxembourg, Luxembourg
[6] Off 4S813, Lab JA Dieudonne, Campus Valrose, F-06108 Nice, France
关键词
Co-clustering; Latent block model; Text matrices; Topic model; Variational inference; EM ALGORITHM; LIKELIHOOD;
D O I
10.1016/j.csda.2019.03.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Textual interaction data involving two disjoint sets of individuals/objects are considered. An example of such data is given by the reviews on web platforms (e.g. Amazon, TripAdvisor, etc.) where buyers comment on products/services they bought. A new generative model, the latent topic block model (LTBM), is developed along with an inference algorithm to simultaneously partition the elements of each set, accounting for the textual information. The estimation of the model parameters is performed via a variational version of the expectation maximization (EM) algorithm. A model selection criterion is formally obtained to estimate the number of partitions. Numerical experiments on simulated data are carried out to highlight the main features of the estimation procedure. Two real-world datasets are finally employed to show the usefulness of the proposed approach. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:247 / 270
页数:24
相关论文
共 41 条
[1]  
Anandkumar Anima, 2012, Advances in neural information processing systems, P917
[2]  
[Anonymous], 1991, RR1364 INRIA
[3]  
[Anonymous], 2005, 5 IEEE INT C DAT MIN
[4]  
[Anonymous], 2012, Ph.D. thesis
[5]  
[Anonymous], 2006, Advances in Neural Information Processing Systems
[6]  
Banerjee A, 2007, J MACH LEARN RES, V8, P1919
[7]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[8]   Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) :561-575
[9]  
Blei D., 2006, ADV NEURAL INFORM PR
[10]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022