Augmenting Low-Resource Cross-Lingual Summarization with Progression-Grounded Training and Prompting

被引：0

作者：

Ma, Jiu Shun ^{[1
]}

Huang, Yuxin ^{[1
]}

Wang, Linqin ^{[2
]}

Huang, Xiang ^{[3
]}

Peng, Hao ^{[3
]}

Yu, Zhengtao ^{[1
]}

Yu, Philip ^{[4
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China

[2] Kunming Univ Sci & Technol, Kunming, Peoples R China

[3] Beihang Univ, Beijing, Peoples R China

[4] Univ Illinois, Chicago, IL USA

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2024年 / 23卷 / 09期

基金：

中国国家自然科学基金;

关键词：

CLS; pretrain plus finetune paradigm; low-resource languages; progressive training; reinforcement learning; discrete-prompts;

D O I：

10.1145/3675167

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-lingual summarization (CLS), generating summaries in one language from source documents in another language, offers invaluable assistance in enabling global access to information for people worldwide. State-of-the-art neural summarization models typically train or fine-tune language models on large-scale corpora. However, this is difficult to achieve in realistic low-resource scenarios due to the lack of domain-specific annotated data. In this article, we present a novel cross-lingual summarization model that utilizes progressive training with mBART and employs reinforcement learning to optimize discrete prompts, which addresses low-resource cross-lingual summarization through a two-pronged approach. During training, we introduce a progressive approach based on mBART, which allows the pre-trained model to gradually acquire the ability to compress information, develop cross-lingual capabilities, and ultimately adapt to specific summarization tasks. During downstream summarization, we employ a discrete-prompts joint pre-trained model based on reinforcement learning optimization to achieve low-resource cross-lingual summarization. Experimental results on four cross-lingual summarization datasets demonstrate state-of-the-art performance and superiority compared to six baselines in low-resource scenarios.

引用

页数：22