Text summarization using Wikipedia

被引:64
作者
Sankarasubramaniam, Yogesh [1 ]
Ramanathan, Krishnan [1 ]
Ghosh, Subhankar [1 ,2 ]
机构
[1] HP Labs India, Bangalore, Karnataka, India
[2] SAS Inst, San Diego, CA USA
关键词
Summarization; Wikipedia; Sentence ranking; Personalization; SENTENCES;
D O I
10.1016/j.ipm.2014.02.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization - they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence-concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization - users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:443 / 461
页数:19
相关论文
empty
未找到相关数据