Multi-document Summarization by Creating Synthetic Document Vector Based on Language Model

被引:0
作者
Kim, Dahae [1 ]
Lee, Jee-Hyoung [1 ]
机构
[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea
来源
2016 JOINT 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 17TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2016年
关键词
Multi-document summarization; Core content; Major Information; Synthetic document vector; Language model;
D O I
10.1109/SCIS&ISIS.2016.159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-document summarization is to create summaries covering the major information that multiple documents tell in common. For this point, the existing methods are based on hand-crafted features for word and sentence. However, it is difficult to figure out the core contents of each document with the hand-crafted features because they have the limited information presented the given documents. Moreover, there exists a limit to figure out the major information because documents with the same meaning used to be paraphrased depending on their writers. Therefore, it is necessary to represent the semantic meanings of documents as well as sentences through understanding natural language. In this paper, we propose a new multi-document summarization system by creating a synthetic document vector covering the whole documents based on Language Model, whose is well-known for learning the semantic features in text. We experimented with DUC 2004 dataset provided by Document Understanding Conference (DUC) and the results show that our method summarizes multiple documents effectively based on their core contents.
引用
收藏
页码:605 / 609
页数:5
相关论文
共 16 条
[1]  
[Anonymous], 2012, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
[2]  
[Anonymous], 2011, P ADV NEURAL INFORM
[3]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[4]  
Cao ZQ, 2015, AAAI CONF ARTIF INTE, P2153
[5]  
Conroy John M., 2004, P DOC UND C
[6]  
Hong K, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1608
[7]  
Hong Kai, 2014, P EACL GOTH SWED APR
[8]  
Kim Dahae, 2015, 16 INT S ADV INT SYS, P733
[9]  
Kim Noo-ri, 2015, 10 AS PAC INT C INF, P121
[10]  
Lin C.-Y., 2004, ROUGE PACKAGE AUTOMA