Augmenting Low-Resource Cross-Lingual Summarization with Progression-Grounded Training and Prompting

被引:0
|
作者
Ma, Jiu Shun [1 ]
Huang, Yuxin [1 ]
Wang, Linqin [2 ]
Huang, Xiang [3 ]
Peng, Hao [3 ]
Yu, Zhengtao [1 ]
Yu, Philip [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Kunming Univ Sci & Technol, Kunming, Peoples R China
[3] Beihang Univ, Beijing, Peoples R China
[4] Univ Illinois, Chicago, IL USA
基金
中国国家自然科学基金;
关键词
CLS; pretrain plus finetune paradigm; low-resource languages; progressive training; reinforcement learning; discrete-prompts;
D O I
10.1145/3675167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual summarization (CLS), generating summaries in one language from source documents in another language, offers invaluable assistance in enabling global access to information for people worldwide. State-of-the-art neural summarization models typically train or fine-tune language models on large-scale corpora. However, this is difficult to achieve in realistic low-resource scenarios due to the lack of domain-specific annotated data. In this article, we present a novel cross-lingual summarization model that utilizes progressive training with mBART and employs reinforcement learning to optimize discrete prompts, which addresses low-resource cross-lingual summarization through a two-pronged approach. During training, we introduce a progressive approach based on mBART, which allows the pre-trained model to gradually acquire the ability to compress information, develop cross-lingual capabilities, and ultimately adapt to specific summarization tasks. During downstream summarization, we employ a discrete-prompts joint pre-trained model based on reinforcement learning optimization to achieve low-resource cross-lingual summarization. Experimental results on four cross-lingual summarization datasets demonstrate state-of-the-art performance and superiority compared to six baselines in low-resource scenarios.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
    Zhou, Shuyan
    Rijhwani, Shruti
    Wieting, John
    Carbonell, Jaime
    Neubig, Graham
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 109 - 124
  • [22] Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Boyd-Graber, Jordan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9547 - 9554
  • [23] Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
    Effland, Thomas
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 122 - 138
  • [24] Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
    Ecker, Stefan
    Horbach, Andrea
    Thater, Stefan
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1709 - 1717
  • [25] MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning
    Xia, Mengzhou
    Zheng, Guoqing
    Mukherjee, Subhabrata
    Shokouhi, Milad
    Neubig, Graham
    Awadallah, Ahmed Hassan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 499 - 511
  • [26] Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2581 - 2584
  • [27] Mixed-Lingual Pre-training for Cross-lingual Summarization
    Xu, Ruochen
    Zhu, Chenguang
    Shi, Yu
    Zeng, Michael
    Huang, Xuedong
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 536 - 541
  • [28] UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages
    Trinh Pham
    Le, Khoi M.
    Luu Anh Tuan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3168 - 3184
  • [29] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
  • [30] Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages
    Gupta, Shivanshu
    Matsubara, Yoshitomo
    Chadha, Ankit
    Moschitti, Alessandro
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14078 - 14092