Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

被引:0
作者
Huang, Yuxin [1 ,2 ]
Gu, Huailing [1 ,2 ]
Yu, Zhengtao [1 ,2 ]
Gao, Yumeng [1 ,2 ]
Pan, Tong [1 ,2 ]
Xu, Jialong [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650504, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual summarization; Low-resource language; Noisy data; Fine-grained reinforcement learning; Word correlation; Word missing degree; TP391;
D O I
10.1631/FITEE.2300296
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.
引用
收藏
页码:121 / 134
页数:14
相关论文
共 39 条
[1]   Zero-Shot Cross-Lingual Neural Headline Generation [J].
Ayana ;
Shen, Shi-qi ;
Chen, Yun ;
Yang, Cheng ;
Liu, Zhi-yuan ;
Sun, Mao-song .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) :2319-2327
[2]  
Bai Yu, 2021, P 59 ANN M ASS COMP, V1, P6910, DOI DOI 10.18653/V1/2021.ACL-LONG.538
[3]  
Böhm F, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3110
[4]  
Cao Yue, 2020, P 58 ANN M ASS COMP, P6220, DOI [10.18653/v1/2020.acl-main.554, DOI 10.18653/V1/2020.ACL-MAIN.554]
[5]  
Dou ZY, 2020, NEURAL GENERATION AND TRANSLATION, P60
[6]  
Dyer Chris, 2013, P C N AM CHAPT ASS C
[7]  
Hermann Karl Moritz, 2015, Advances in neural information processing systems, V28
[8]  
Hu Q., 2015, P 2015 C EMPIRICAL M, P1967, DOI DOI 10.18653/V1/D15
[9]   Shot classification and replay detection for sports video summarization [J].
Javed, Ali ;
Ali Khan, Amen .
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (05) :790-800
[10]  
Jiang Shuyu, 2022, arXiv