A novel contextual topic model for multi-document summarization

被引:54
作者
Yang, Guangbing [1 ]
Wen, Dunwei [2 ]
Kinshuk [2 ]
Chen, Nian-Shing [3 ]
Sutinen, Erkki [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu 80101, Finland
[2] Athabasca Univ, Sch Comp & Informat Syst, Athabasca, AB T9S 3A3, Canada
[3] Natl Sun Yat Sen Univ, Dept Informat Management, Kaohsiung 80424, Taiwan
基金
加拿大自然科学与工程研究理事会;
关键词
Multi-document summarization; Hierarchical topic model; Contextual topic; INFORMATION; GRAPH;
D O I
10.1016/j.eswa.2014.09.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information overload becomes a serious problem in the digital age. It negatively impacts understanding of useful information. How to alleviate this problem is the main concern of research on natural language processing, especially multi-document summarization. With the aim of seeking a new method to help justify the importance of similar sentences in multi-document summarizations, this study proposes a novel approach based on recent hierarchical Bayesian topic models. The proposed model incorporates the concepts of n-grams into hierarchically latent topics to capture the word dependencies that appear in the local context of a word. The quantitative and qualitative evaluation results show that this model has outperformed both hLDA and LDA in document modeling. In addition, the experimental results in practice demonstrate that our summarization system implementing this model can significantly improve the performance and make it comparable to the state-of-the-art summarization systems. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1340 / 1352
页数:13
相关论文
共 46 条
[1]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[2]  
[Anonymous], 2008, P EMNLP
[3]  
[Anonymous], 2003, P 2003 C N AM CHAPT
[4]  
[Anonymous], The impact of frequency on summarization
[5]  
[Anonymous], 2004, P ACL WORKSH TEXT SU
[6]   Rhetorics-based multi-document summarization [J].
Atkinson, John ;
Munoz, Ricardo .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) :4346-4352
[7]  
Barzilay R, 1999, ADVANCES IN AUTOMATIC TEXT SUMMARIZATION, P111
[8]   Inferring strategies for sentence ordering in multidocument news summarization [J].
Barzilay, R ;
Elhadad, N ;
McKeown, KR .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2002, 17 :35-55
[9]   Statistical models for text segmentation [J].
Beeferman, D ;
Berger, A ;
Lafferty, J .
MACHINE LEARNING, 1999, 34 (1-3) :177-210
[10]  
Berger AdamL., 2000, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '00, P144, DOI DOI 10.1145/345508.345565