Multi-document summarization via submodularity

被引:32
作者
Li, Jingxuan [1 ]
Li, Lei [1 ]
Li, Tao [1 ]
机构
[1] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL 33199 USA
基金
美国国家科学基金会;
关键词
Multi-document summarization; Submodularity; Greedy algorithm;
D O I
10.1007/s10489-012-0336-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-document summarization is becoming an important issue in the Information Retrieval community. It aims to distill the most important information from a set of documents to generate a compressed summary. Given a set of documents as input, most of existing multi-document summarization approaches utilize different sentence selection techniques to extract a set of sentences from the document set as the summary. The submodularity hidden in the term coverage and the textual-unit similarity motivates us to incorporate this property into our solution to multi-document summarization tasks. In this paper, we propose a new principled and versatile framework for different multi-document summarization tasks using submodular functions (Nemhauser et al. in Math. Prog. 14(1):265-294, 1978) based on the term coverage and the textual-unit similarity which can be efficiently optimized through the improved greedy algorithm. We show that four known summarization tasks, including generic, query-focused, update, and comparative summarization, can be modeled as different variations derived from the proposed framework. Experiments on benchmark summarization data sets (e.g., DUC04-06, TAC08, TDT2 corpora) are conducted to demonstrate the efficacy and effectiveness of our proposed framework for the general multi-document summarization tasks.
引用
收藏
页码:420 / 430
页数:11
相关论文
共 29 条
[1]  
[Anonymous], 2008, Proceedings of SIGIR, DOI 10.1145/1390334.1390384
[2]  
[Anonymous], 2000, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition
[3]   Personalized e-news monitoring agent system for tracking user-interested Chinese news events [J].
Chen, Chih-Ming ;
Liu, Chao-Yu .
APPLIED INTELLIGENCE, 2009, 30 (02) :121-141
[4]   SUBMODULAR SET-FUNCTIONS, MATROIDS AND THE GREEDY ALGORITHM - TIGHT WORST-CASE BOUNDS AND SOME GENERALIZATIONS OF THE RADO-EDMONDS THEOREM [J].
CONFORTI, M ;
CORNUEJOLS, G .
DISCRETE APPLIED MATHEMATICS, 1984, 7 (03) :251-274
[5]  
Dang Hoa Trang, 2008, P TEXT AN C
[6]  
Dang HoaTrang., 2007, P DUC 07 WORKSHOP, P1
[7]  
Daumé H, 2006, COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, P305
[8]   Classifier subset selection for biomedical named entity recognition [J].
Dimililer, Nazife ;
Varoglu, Ekrem ;
Altincay, Hakan .
APPLIED INTELLIGENCE, 2009, 31 (03) :267-282
[9]  
Erkan G, 2004, P EMNLP, V4
[10]  
Goldstein J., 2000, NAACL ANLP 2000 WORK