The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text

被引：92

作者：

Lin, Tianyi ^{[1
]}

Tian, Wentao ^{[1
]}

Mei, Qiaozhu ^{[2
]}

Cheng, Hong ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] Univ Michigan, Sch Informat, Ann Arbor, MI 48109 USA

来源：

WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2014年

基金：

美国国家科学基金会;

关键词：

Topic modeling; spike and slab; sparse representation; user-generated content;

D O I：

10.1145/2566486.2567980

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a "Spike and Slab" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model out-performs both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.

引用

页码：539 / 549

页数：11

共 32 条

[1] [Anonymous], 2007, PROC ANN C NEUR INFO
[2] [Anonymous], 2011, International Conference on Artificial Intelligence and Statistics
[3] [Anonymous], NIPS WORKSH APPL TOP
[4] [Anonymous], ICML
[5] [Anonymous], 2006, Advances in Neural Information Processing Systems
[6] Archambeau C., 2011, NIPS BAYES NONP WORK
[7] Asuncion A., 2009, C UNC ART INT UAI QU, P27, DOI DOI 10.1080/10807030390248483
[8] Probabilistic Topic Models
Blei, David M.
[J]. COMMUNICATIONS OF THE ACM, 2012, 55 (04) : 77 - 84
[9] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[10] Blei DM, 2003, ADV NEURAL INF PROCE, V16, P106

← 1 2 3 4 →