Multi-document summarization using closed patterns

被引:39
作者
Qiang, Ji-Peng [1 ,2 ]
Chen, Ping [2 ]
Ding, Wei [2 ]
Xie, Fei [1 ,3 ]
Wu, Xindong [1 ,4 ]
机构
[1] Hefei Univ Technol, Dept Comp Sci, Hefei 230009, Peoples R China
[2] Univ Massachusetts, Dept Comp Sci, Boston, MA 02125 USA
[3] Hefei Normal Univ, Dept Comp Sci & Technol, Hefei 230601, Peoples R China
[4] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
基金
中国国家自然科学基金;
关键词
Multi-document summarization; Closed patterns; Text mining; Diversity; Content coverage; SEQUENTIAL PATTERNS; FREQUENT;
D O I
10.1016/j.knosys.2016.01.030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are two main categories of multi-document summarization: term-based and ontology-based methods. A term-based method cannot deal with the problems of polysemy and synonymy. An ontology-based approach addresses such problems by taking into account of the semantic information of document content, but the construction of ontology requires lots of manpower. To overcome these open problems, this paper presents a pattern-based model for generic multi-document summarization, which exploits closed patterns to extract the most salient sentences from a document collection and reduce redundancy in the summary. Our method calculates the weight of each sentence of a document collection by accumulating the weights of its covering closed patterns with respect to this sentence, and iteratively selects one sentence that owns the highest weight and less similarity to the previously selected sentences, until reaching the length limitation. The sentence weight calculation by patterns reduces the dimension and captures more relevant information. Our method combines the advantages of the term-based and ontology-based models while avoiding their weaknesses. Empirical studies on the benchmark DUC2004 datasets demonstrate that our pattern-based method significantly outperforms the state-of-the-art methods. Multi-document summarization can be used to extract a particular individual's opinions in the form of closed patterns, from this individual's documents shared in social networks, hence provides a useful tool for further analyzing the individual's behavior and influence in group activities. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:28 / 38
页数:11
相关论文
共 49 条
[1]  
Algarni Abdulmohsen, 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference, PAKDD 2013. Proceedings, P532, DOI 10.1007/978-3-642-37453-1_44
[2]   Multiple documents summarization based on evolutionary optimization algorithm [J].
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) :1675-1689
[3]  
[Anonymous], 1994, P 20 INT C VER LARG
[4]  
[Anonymous], 2010, J COMPUTING
[5]  
Baralis E., 2012, P 27 ANN ACM S APPL, P782
[6]   Multi-document summarization based on the Yago ontology [J].
Baralis, Elena ;
Cagliero, Luca ;
Jabeen, Saima ;
Fiori, Alessandro ;
Shah, Sajid .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (17) :6976-6984
[7]   A bottom-up approach to sentence ordering for multi-document summarization [J].
Bollegala, Danushka ;
Okazaki, Naoaki ;
Ishizuka, Mitsuru .
INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (01) :89-109
[8]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[9]  
Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025
[10]  
Conroy J. M., 2001, SIGIR Forum, P406