Multi-document summarization via submodularity

被引:29
作者
Li, Jingxuan [1 ]
Li, Lei [1 ]
Li, Tao [1 ]
机构
[1] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL 33199 USA
基金
美国国家科学基金会;
关键词
Multi-document summarization; Submodularity; Greedy algorithm;
D O I
10.1007/s10489-012-0336-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-document summarization is becoming an important issue in the Information Retrieval community. It aims to distill the most important information from a set of documents to generate a compressed summary. Given a set of documents as input, most of existing multi-document summarization approaches utilize different sentence selection techniques to extract a set of sentences from the document set as the summary. The submodularity hidden in the term coverage and the textual-unit similarity motivates us to incorporate this property into our solution to multi-document summarization tasks. In this paper, we propose a new principled and versatile framework for different multi-document summarization tasks using submodular functions (Nemhauser et al. in Math. Prog. 14(1):265-294, 1978) based on the term coverage and the textual-unit similarity which can be efficiently optimized through the improved greedy algorithm. We show that four known summarization tasks, including generic, query-focused, update, and comparative summarization, can be modeled as different variations derived from the proposed framework. Experiments on benchmark summarization data sets (e.g., DUC04-06, TAC08, TDT2 corpora) are conducted to demonstrate the efficacy and effectiveness of our proposed framework for the general multi-document summarization tasks.
引用
收藏
页码:420 / 430
页数:11
相关论文
共 50 条
  • [21] SUBTOPIC-BASED MULTI-DOCUMENT SUMMARIZATION
    Dai, Lin
    Tang, Ji-Liang
    Xia, Yun-Qing
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3505 - +
  • [22] Multi-document Summarization Based on Sentence Clustering
    Zheng, Hai-Tao
    Gong, Shu-Qin
    Chen, Hao
    Jiang, Yong
    Xia, Shu-Tao
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 429 - 436
  • [23] Multi-Document Summarization Using Sentence Clustering
    Gupta, Virendra Kumar
    Siddiqui, Tanveer J.
    4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2012), 2012,
  • [24] Rhetorics-based multi-document summarization
    Atkinson, John
    Munoz, Ricardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) : 4346 - 4352
  • [25] Multi-document summarization using discourse models
    Cardoso, Paula C. F.
    Pardo, Thiago A. S.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2016, (56): : 57 - 64
  • [26] Multi-document summarization based on unsupervised clustering
    Ji, Paul
    INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS, 2006, 4182 : 560 - 566
  • [27] Geodesic Distance based Multi-document Summarization
    Ma, Huifang
    He, Qing
    Shi, Zhongzhi
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 54 - 59
  • [28] Solving Multi-Document Summarization as an Orienteering Problem
    Al-Saleh, Asma
    Menai, Mohamed El Bachir
    ALGORITHMS, 2018, 11 (07)
  • [29] TOMDS (Topic-Oriented Multi-Document Summarization): Enabling Personalized Customization of Multi-Document Summaries
    Zhang, Xin
    Wei, Qiyi
    Song, Qing
    Zhang, Pengzhou
    APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [30] GameWikiSum: a Novel Large Multi-Document Summarization Dataset
    Antognini, Diego
    Faltings, Boi
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6645 - 6650