Augmenting Low-Resource Cross-Lingual Summarization with Progression-Grounded Training and Prompting

被引:0
作者
Ma, Jiu Shun [1 ]
Huang, Yuxin [1 ]
Wang, Linqin [2 ]
Huang, Xiang [3 ]
Peng, Hao [3 ]
Yu, Zhengtao [1 ]
Yu, Philip [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Kunming Univ Sci & Technol, Kunming, Peoples R China
[3] Beihang Univ, Beijing, Peoples R China
[4] Univ Illinois, Chicago, IL USA
基金
中国国家自然科学基金;
关键词
CLS; pretrain plus finetune paradigm; low-resource languages; progressive training; reinforcement learning; discrete-prompts;
D O I
10.1145/3675167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual summarization (CLS), generating summaries in one language from source documents in another language, offers invaluable assistance in enabling global access to information for people worldwide. State-of-the-art neural summarization models typically train or fine-tune language models on large-scale corpora. However, this is difficult to achieve in realistic low-resource scenarios due to the lack of domain-specific annotated data. In this article, we present a novel cross-lingual summarization model that utilizes progressive training with mBART and employs reinforcement learning to optimize discrete prompts, which addresses low-resource cross-lingual summarization through a two-pronged approach. During training, we introduce a progressive approach based on mBART, which allows the pre-trained model to gradually acquire the ability to compress information, develop cross-lingual capabilities, and ultimately adapt to specific summarization tasks. During downstream summarization, we employ a discrete-prompts joint pre-trained model based on reinforcement learning optimization to achieve low-resource cross-lingual summarization. Experimental results on four cross-lingual summarization datasets demonstrate state-of-the-art performance and superiority compared to six baselines in low-resource scenarios.
引用
收藏
页数:22
相关论文
empty
未找到相关数据