Fine-tuning pretrained transformer encoders for sequence-to-sequence learning

被引：0

作者：

Hangbo Bao

Li Dong

Wenhui Wang

Nan Yang

Songhao Piao

Furu Wei

机构：

[1] Harbin Institute of Technology,Department of Computer Science and Technology

[2] Microsoft Research Asia,undefined

来源：

International Journal of Machine Learning and Cybernetics | 2024年 / 15卷

关键词：

Pretrained models; Transformer; Natural language generation; Document summarization; Question generation; Multilingual;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, we introduce s2s-ft, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.

引用

页码：1711 / 1728

页数：17

共 50 条

[1] Fine-tuning pretrained transformer encoders for sequence-to-sequence learning
Bao, Hangbo
Dong, Li
Wang, Wenhui
Yang, Nan
Piao, Songhao
Wei, Furu
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (05) : 1711 - 1728
[2] Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing
Zhou, Jiawei
Naseem, Tahira
Astudillo, Ramon Fernandez
Lee, Young-Suk
Florian, Radu
Roukos, Salim
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6279 - 6290
[3] Document Ranking with a Pretrained Sequence-to-Sequence Model
Nogueira, Rodrigo
Jiang, Zhiying
Pradeep, Ronak
Lin, Jimmy
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 708 - 718
[4] Fine-Tuning Self-Supervised Multilingual Sequence-To-Sequence Models for Extremely Low-Resource NMT
Thillainathan, Sarubi
Ranathunga, Surangika
Jayasena, Sanath
MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, : 432 - 437
[5] BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
Eddine, Moussa Kamal
Tixier, Antoine J-P
Vazirgiannis, Michalis
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9369 - 9390
[6] Rebetiko Singer Identification: Fine-tuning and explaining deep pretrained transformer models
Papakostas, Maximos Kaliakatsos
Zacharakis, Asterios
Velenis, Konstantinos
Cambouropoulos, Emilios
PROCEEDINGS OF THE 19TH INTERNATIONAL AUDIO MOSTLY CONFERENCE, AM 2024, 2024, : 285 - 291
[7] Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022
Tsiamas, Ioannis
Gallego, Gerard, I
Escolano, Carlos
Fonollosa, Jose A. R.
Costa-jussa, Marta R.
PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 265 - 276
[8] On Surgical Fine-tuning for Language Encoders
Lodha, Abhilasha
Belapurkar, Gayatri
Chalkapurkar, Saloni
Tao, Yuanming
Ghosh, Reshmi
Basu, Samyadeep
Petrov, Dmitrii
Srinivasan, Soundararajan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3105 - 3113
[9] Turkish abstractive text summarization using pretrained sequence-to-sequence models
Baykara, Batuhan
Gungor, Tunga
NATURAL LANGUAGE ENGINEERING, 2023, 29 (05) : 1275 - 1304
[10] Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
Long, Yinghan
Chowdhury, Sayeed Shafayet
Roy, Kaushik
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8325 - 8337

← 1 2 3 4 5 →