Fine-tuning pretrained transformer encoders for sequence-to-sequence learning

被引:0
|
作者
Hangbo Bao
Li Dong
Wenhui Wang
Nan Yang
Songhao Piao
Furu Wei
机构
[1] Harbin Institute of Technology,Department of Computer Science and Technology
[2] Microsoft Research Asia,undefined
来源
International Journal of Machine Learning and Cybernetics | 2024年 / 15卷
关键词
Pretrained models; Transformer; Natural language generation; Document summarization; Question generation; Multilingual;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we introduce s2s-ft, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.
引用
收藏
页码:1711 / 1728
页数:17
相关论文
共 50 条
  • [1] Fine-tuning pretrained transformer encoders for sequence-to-sequence learning
    Bao, Hangbo
    Dong, Li
    Wang, Wenhui
    Yang, Nan
    Piao, Songhao
    Wei, Furu
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (05) : 1711 - 1728
  • [2] Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing
    Zhou, Jiawei
    Naseem, Tahira
    Astudillo, Ramon Fernandez
    Lee, Young-Suk
    Florian, Radu
    Roukos, Salim
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6279 - 6290
  • [3] Document Ranking with a Pretrained Sequence-to-Sequence Model
    Nogueira, Rodrigo
    Jiang, Zhiying
    Pradeep, Ronak
    Lin, Jimmy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 708 - 718
  • [4] Fine-Tuning Self-Supervised Multilingual Sequence-To-Sequence Models for Extremely Low-Resource NMT
    Thillainathan, Sarubi
    Ranathunga, Surangika
    Jayasena, Sanath
    MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, : 432 - 437
  • [5] BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
    Eddine, Moussa Kamal
    Tixier, Antoine J-P
    Vazirgiannis, Michalis
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9369 - 9390
  • [6] Rebetiko Singer Identification: Fine-tuning and explaining deep pretrained transformer models
    Papakostas, Maximos Kaliakatsos
    Zacharakis, Asterios
    Velenis, Konstantinos
    Cambouropoulos, Emilios
    PROCEEDINGS OF THE 19TH INTERNATIONAL AUDIO MOSTLY CONFERENCE, AM 2024, 2024, : 285 - 291
  • [7] Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022
    Tsiamas, Ioannis
    Gallego, Gerard, I
    Escolano, Carlos
    Fonollosa, Jose A. R.
    Costa-jussa, Marta R.
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 265 - 276
  • [8] On Surgical Fine-tuning for Language Encoders
    Lodha, Abhilasha
    Belapurkar, Gayatri
    Chalkapurkar, Saloni
    Tao, Yuanming
    Ghosh, Reshmi
    Basu, Samyadeep
    Petrov, Dmitrii
    Srinivasan, Soundararajan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3105 - 3113
  • [9] Turkish abstractive text summarization using pretrained sequence-to-sequence models
    Baykara, Batuhan
    Gungor, Tunga
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (05) : 1275 - 1304
  • [10] Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
    Long, Yinghan
    Chowdhury, Sayeed Shafayet
    Roy, Kaushik
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8325 - 8337