Topical Coherence in LDA-based Models through Induced Segmentation

被引:15
作者
Amoualian, Hesam [1 ]
Lu, Wei [2 ]
Gaussier, Eric [1 ]
Balikas, Georgios [1 ]
Amini, Massih-Reza [1 ]
Clausel, Marianne [3 ]
机构
[1] Univ Grenoble Alps, CNRS, Grenoble INP, LIG, Grenoble, France
[2] Singapore Univ Technol & Design, Singapore, Singapore
[3] Univ Grenoble Alps, CNRS, Grenoble INP, LJK, Grenoble, France
来源
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1 | 2017年
关键词
D O I
10.18653/v1/P17-1165
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.
引用
收藏
页码:1799 / 1809
页数:11
相关论文
共 33 条
[1]   Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams [J].
Amoualian, Hesam ;
Clausel, Marianne ;
Gaussier, Eric ;
Amini, Massih-Reza .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :695-704
[2]  
[Anonymous], 2005, Advances in Neural Information Processing Systems
[3]  
[Anonymous], 2013, Copulas in Machine Learning
[4]  
[Anonymous], 2009, NATURAL LANGUAGE PRO, DOI DOI 10.1007/S10579-010-9124-X
[5]  
Asuncion A., 2009, C UNC ART INT UAI QU, P27, DOI DOI 10.1080/10807030390248483
[6]  
Balikas G., 2016, P COLING 2016 26 INT, P1767
[7]   On a Topic Model for Sentences [J].
Balikas, Georgios ;
Amini, Massih-Reza ;
Clausel, Marianne .
SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, :921-924
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]  
Boyd-Graber J., 2008, P ADV NEURAL INFORM, P185
[10]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO