MWI-Sum: A Multilingual Summarizer Based on Frequent Weighted Itemsets

被引:36
作者
Baralis, Elena [1 ]
Cagliero, Luca [1 ]
Fiori, Alessandro [2 ]
Garza, Paolo [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
[2] IRCC Inst Canc Res Candiolo, I-10060 Candiolo, Italy
关键词
Algorithms; Multilingual summarization; text mining; frequent weighted itemset mining;
D O I
10.1145/2809786
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multidocument summarization addresses the selection of a compact subset of highly informative sentences, i.e., the summary, from a collection of textual documents. To perform sentence selection, two parallel strategies have been proposed: (a) apply general-purpose techniques relying on data mining or information retrieval techniques, and/or (b) perform advanced linguistic analysis relying on semantics-based models (e.g., ontologies) to capture the actual sentence meaning. Since there is an increasing need for processing documents written in different languages, the attention of the research community has recently focused on summarizers based on strategy (a). This article presents a novel multilingual summarizer, namely MWI-Sum (Multilingual Weighted Itemset-based Summarizer), that exploits an itemset-based model to summarize collections of documents ranging over the same topic. Unlike previous approaches, it extracts frequent weighted itemsets tailored to the analyzed collection and uses them to drive the sentence selection process. Weighted itemsets represent correlations among multiple highly relevant terms that are neglected by previous approaches. The proposed approach makes minimal use of language-dependent analyses. Thus, it is easily applicable to document collections written in different languages. Experiments performed on benchmark and real-life collections, English-written and not, demonstrate that the proposed approach performs better than state-of-the-art multilingual document summarizers.
引用
收藏
页数:35
相关论文
共 67 条
[1]  
[Anonymous], 2003, P 2003 C N AM CHAPT
[2]  
[Anonymous], 2010, P 16 ACM SIGKDD INT, DOI DOI 10.1145/1835804.1835843
[3]  
[Anonymous], 2012, Wordnet: A lexical database of english
[4]  
[Anonymous], 2004, DOC UND C HTL NAACL
[5]  
[Anonymous], 2011, TEXT AN C NIST TEXT
[6]  
[Anonymous], OPEN TEXT SUMMARIZER
[7]  
[Anonymous], 2009, NATURAL LANGUAGE PRO, DOI DOI 10.1007/S10579-010-9124-X
[8]   Rhetorics-based multi-document summarization [J].
Atkinson, John ;
Munoz, Ricardo .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) :4346-4352
[9]  
Baralis E., 2012, P 27 ANN ACM S APPL, P782
[10]   Generation and evaluation of summaries of academic teaching materials [J].
Baralis, Elena ;
Cagliero, Luca ;
Farinetti, Laura .
39TH ANNUAL IEEE COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2015), VOL 2, 2015, :881-886