TSGVi: a graph-based summarization system for Vietnamese documents

被引:26
作者
Tu-Anh Nguyen-Hoang [1 ]
Khai Nguyen [1 ]
Quang-Vinh Tran [1 ]
机构
[1] VNU HCM, Univ Sci, Fac Informat Technol, Ho Chi Minh, Vietnam
关键词
Graph model; Weighted PageRank; Sentence extraction; Multi-document summarization; Vietnamese;
D O I
10.1007/s12652-012-0143-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an automatic method to generate an extractive summary of multiple Vietnamese documents which are related to a common topic by modeling text documents as weighted undirected graphs. It initially builds undirected graphs with vertices representing the sentences of documents and edges indicate the similarity between sentences. Then, by adopting PageRank algorithm, we can generate salient scores for sentences. Sentences are ranked according to their salient scores and selected based on maximal marginal relevance to form the summaries. These summaries are combined and applied the same process one more time to form the final extractive summary of the document set. A series of experiments are performed on Vietnamese news articles and English data of DUC 2002, 2003, 2007. The results demonstrate the effectiveness of the proposed technique over reference systems.
引用
收藏
页码:305 / 313
页数:9
相关论文
共 25 条
[1]  
[Anonymous], 1 WORLD C INT FED SY
[2]  
[Anonymous], 2003, HLT NAACL
[3]  
Barzilay R., 1997, Intelligent Scalable Text Summarization. Proceedings of a Workshop, P10
[4]  
Berger AdamL., 2000, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '00, P144, DOI DOI 10.1145/345508.345565
[5]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[6]  
Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025
[7]  
Do Phuc, 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies (RIVF 2008), P209, DOI 10.1109/RIVF.2008.4586357
[8]   LexRank: Graph-based lexical centrality as salience in text summarization [J].
Erkan, G ;
Radev, DR .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 :457-479
[9]   Summarizing text documents: Sentence selection and evaluation metrics [J].
Goldstein, J ;
Kantrowitz, M ;
Mittal, V ;
Carbonell, J .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :121-128
[10]  
Ha TL, 2005, 1 WORLD C INT FED SY