Latent tree models for hierarchical topic detection

被引：30

作者：

Chen, Peixian ^{[1
]}

Zhang, Nevin L. ^{[1
]}

Liu, Tengfei ^{[2
]}

Poon, Leonard K. M. ^{[3
]}

Chen, Zhourong ^{[1
]}

Khawar, Farhan ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China

[2] Ant Financial Serv Grp, Shanghai, Peoples R China

[3] Educ Univ Hong Kong, Dept Math & Informat Technol, Hong Kong, Hong Kong, Peoples R China

来源：

ARTIFICIAL INTELLIGENCE | 2017年 / 250卷

关键词：

Probabilistic graphical models; Text analysis; Hierarchical latent tree analysis; Hierarchical topic detection; ALGORITHM;

D O I：

10.1016/j.artint.2017.06.004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTM5). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：105 / 124

页数：20

共 54 条

[1]

Andrzejewski David, 2009, Proc Int Conf Mach Learn, V382, P25

[2]

[Anonymous], 2000, KDD WORKSH TEXT MIN

[3]

[Anonymous], 2006, MACH LEARN P 23 INT

[4]

[Anonymous], 2006, ICML, DOI DOI 10.1145/1143844.1143917

[5]

[Anonymous], 2008, Advances in Neural Information Processing Systems

[6]

[Anonymous], 2013, ICLR WORKSH POST

[7]

[Anonymous], 2012, MACHINE LEARNING PRO

[8]

Blei D., 2006, ADV NEURAL INFORM PR

[9] The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies [J].

Blei, David M. ;

Griffiths, Thomas L. ;

Jordan, Michael I. .

JOURNAL OF THE ACM, 2010, 57 (02)

[10]

Blei DM, 2004, ADV NEUR IN, V16, P17

← 1 2 3 4 5 6 →