Topical Coherence in LDA-based Models through Induced Segmentation

被引:15
作者
Amoualian, Hesam [1 ]
Lu, Wei [2 ]
Gaussier, Eric [1 ]
Balikas, Georgios [1 ]
Amini, Massih-Reza [1 ]
Clausel, Marianne [3 ]
机构
[1] Univ Grenoble Alps, CNRS, Grenoble INP, LIG, Grenoble, France
[2] Singapore Univ Technol & Design, Singapore, Singapore
[3] Univ Grenoble Alps, CNRS, Grenoble INP, LJK, Grenoble, France
来源
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1 | 2017年
关键词
D O I
10.18653/v1/P17-1165
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.
引用
收藏
页码:1799 / 1809
页数:11
相关论文
共 33 条
[21]   Organizing the OCA: Learning Faceted Subjects from a Library of Digital Books [J].
Mimno, David ;
McCallum, Andrew .
PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, :376-385
[22]  
Nelsen R.B., 2006, SPRINGER SERIES STAT, V2010
[23]  
Newman D., 2010, HUMAN LANGUAGE TECHN, P100
[24]  
Partalas Ioannis, 2015, J CORR ABS 1503 0858
[25]  
Paul M, 2010, AAAI CONF ARTIF INTE, P545
[26]  
Purver M, 2006, COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, P17
[27]  
Tamura Akihiro, 2016, P 54 ANN M ASS COMP
[28]  
Teh YW, 2006, COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, P985
[29]   An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition [J].
Tsatsaronis, George ;
Balikas, Georgios ;
Malakasiotis, Prodromos ;
Partalas, Ioannis ;
Zschunke, Matthias ;
Alvers, Michael R. ;
Weissenborn, Dirk ;
Krithara, Anastasia ;
Petridis, Sergios ;
Polychronopoulos, Dimitris ;
Almirantis, Yannis ;
Pavlopoulos, John ;
Baskiotis, Nicolas ;
Gallinari, Patrick ;
Artieres, Thierry ;
Ngomo, Axel-Cyrille Ngonga ;
Heino, Norman ;
Gaussier, Eric ;
Barrio-Alvers, Liliana ;
Schroeder, Michael ;
Androutsopoulos, Ion ;
Paliouras, Georgios .
BMC BIOINFORMATICS, 2015, 16
[30]  
Wang Dengting, 2009, Proceedings of the 5th International Conference on Asian and Pacific Coasts. APAC 2009, P297, DOI 10.1142/9789814287951_0129