A Novel Contextual Topic Model for Query-focused Multi-document Summarization

被引:6
作者
Yang, Guangbing [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu 80101, Finland
来源
2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI) | 2014年
关键词
Machine learning; hierarchical topic model; text summarization;
D O I
10.1109/ICTAI.2014.92
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of the oft-decried information overload negatively impacts comprehension of useful information. How to solve this problem has given rise to increase of interest in research on multi-document summarization. With the aim of seeking a new method to help justify the importance and similarity of sentences in multi-document summarization, this study proposes a novel approach based on well-known hierarchical Bayesian topic models. By investigating hierarchical topics and their correlations with respect to the lexical co-occurrences of words, the proposed contextual topic model can determine the relevance of sentences more effectively, and recognize latent topics and arrange them hierarchically as well. The quantitative evaluation results show that this model has outperformed hLDA and LDA in document modeling. In addition, a practical application demonstrates that a summarization system implementing this model can significantly improve the performance of summarization and make it comparable to state-of-the-art summarization systems.
引用
收藏
页码:576 / 583
页数:8
相关论文
共 25 条
  • [1] [Anonymous], 2008, P EMNLP
  • [2] [Anonymous], 2004, P ACL WORKSH TEXT SU
  • [3] Inferring strategies for sentence ordering in multidocument news summarization
    Barzilay, R
    Elhadad, N
    McKeown, KR
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2002, 17 : 35 - 55
  • [4] Statistical models for text segmentation
    Beeferman, D
    Berger, A
    Lafferty, J
    [J]. MACHINE LEARNING, 1999, 34 (1-3) : 177 - 210
  • [5] Berger AdamL., 2000, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '00, P144, DOI DOI 10.1145/345508.345565
  • [6] The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies
    Blei, David M.
    Griffiths, Thomas L.
    Jordan, Michael I.
    [J]. JOURNAL OF THE ACM, 2010, 57 (02)
  • [7] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [8] Celikyilmaz A, 2010, ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P815
  • [9] Celikyilmaz Asli., 2011, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11, V1, P491
  • [10] Croft W.B., 2010, Search Engines: Information Retrieval in Practice, V520