An EDU-Based Approach for Thai Multi-Document Summarization and Its Application

被引:5
作者
Ketui, Nongnuch [1 ]
Theeramunkong, Thanaruk [1 ]
Onsuwan, Chutamanee [2 ]
机构
[1] Thammasat Univ, Sirindhorn Int Inst Technol, Sch Informat Comp & Commun Technol, Thammasat, Thailand
[2] Thammasat Univ, Sirindhorn Int Inst Technol, Fac Liberal Arts, Thammasat, Thailand
关键词
Algorithms; Experimentation; Languages; Performance; Multi-document summarization; EDU-based approach; Thai text summarization; unit selection;
D O I
10.1145/2641567
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to lack of a word/phrase/sentence boundary, summarization of Thai multiple documents has several challenges in unit segmentation, unit selection, duplication elimination, and evaluation dataset construction. In this article, we introduce Thai Elementary Discourse Units (TEDUs) and their derivatives, called Combined TEDUs (CTEDUs), and then present our three-stage method of Thai multi-document summarization, that is, unit segmentation, unit-graph formulation, and unit selection and summary generation. To examine performance of our proposed method, a number of experiments are conducted using 50 sets of Thai news articles with their manually constructed reference summaries. Based on measures of ROUGE-1, ROUGE-2, and ROUGE-SU4, the experimental results show that: (1) the TEDU-based summarization outperforms paragraph-based summarization; (2) our proposed graph-based TEDU weighting with importance-based selection achieves the best performance; and (3) unit duplication consideration and weight recalculation help improve summary quality.
引用
收藏
页数:26
相关论文
共 39 条
[1]   MCMR: Maximum coverage and minimum redundant text summarization model [J].
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Hajirahimova, Makrufa S. ;
Mehdiyev, Chingiz A. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) :14514-14522
[2]   A new sentence similarity measure and sentence based extractive technique for automatic text summarization [J].
Aliguliyev, Ramiz M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7764-7772
[3]  
[Anonymous], 2010, P 2 INT C CORPUS LIN
[4]  
Barzilay R., 1999, P 37 ANN M ASS COMP, P550, DOI [10.3115/1034678.1034760, DOI 10.1115/10146781014760]
[5]   A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously [J].
Cai, Xiaoyan ;
Li, Wenjie .
INFORMATION SCIENCES, 2011, 181 (18) :3816-3827
[6]  
Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025
[7]  
Carlson L., 2003, P 2 SIGDIAL WORKSH D
[8]  
Charoensuk J., 2005, P 6 S NAT LANG PROC
[9]  
Chongsuntornsri A, 2006, 2006 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES,VOLS 1-3, P597
[10]  
Deza MM., 2009, ENCY DISTANCES, DOI [10.1007/978-3-642-00234-2, DOI 10.1007/978-3-642-00234-2]