Latent tree models for hierarchical topic detection

被引:30
作者
Chen, Peixian [1 ]
Zhang, Nevin L. [1 ]
Liu, Tengfei [2 ]
Poon, Leonard K. M. [3 ]
Chen, Zhourong [1 ]
Khawar, Farhan [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
[2] Ant Financial Serv Grp, Shanghai, Peoples R China
[3] Educ Univ Hong Kong, Dept Math & Informat Technol, Hong Kong, Hong Kong, Peoples R China
关键词
Probabilistic graphical models; Text analysis; Hierarchical latent tree analysis; Hierarchical topic detection; ALGORITHM;
D O I
10.1016/j.artint.2017.06.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTM5). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:105 / 124
页数:20
相关论文
共 54 条
[1]  
Andrzejewski David, 2009, Proc Int Conf Mach Learn, V382, P25
[2]  
[Anonymous], 2000, KDD WORKSH TEXT MIN
[3]  
[Anonymous], 2006, MACH LEARN P 23 INT
[4]  
[Anonymous], 2006, ICML, DOI DOI 10.1145/1143844.1143917
[5]  
[Anonymous], 2008, Advances in Neural Information Processing Systems
[6]  
[Anonymous], 2013, ICLR WORKSH POST
[7]  
[Anonymous], 2012, MACHINE LEARNING PRO
[8]  
Blei D., 2006, ADV NEURAL INFORM PR
[9]   The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies [J].
Blei, David M. ;
Griffiths, Thomas L. ;
Jordan, Michael I. .
JOURNAL OF THE ACM, 2010, 57 (02)
[10]  
Blei DM, 2004, ADV NEUR IN, V16, P17