Gradual Syntactic Label Replacement for Language Model Pre-Training

被引：0

作者：

Wang, Yile ^{[1
]}

Zhang, Yue ^{[2
]}

Li, Peng ^{[1
]}

Liu, Yang ^{[3
]}

机构：

[1] Tsinghua Univ, Inst AI Ind Res, Beijing 100084, Peoples R China

[2] Westlake Univ, Sch Engn, Hangzhou 310024, Peoples R China

[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Language model pre-training; syntactic label replacement; curriculum learning; data-centric;

D O I：

10.1109/TASLP.2023.3331096

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Pre-training serves as a foundation of recent NLP models, where language modeling tasks are performed over large texts. Typical models like BERT and GPT take the corpus as a whole and treat each word equally for language modeling. However, recent works show that the naturally existing frequency bias in the raw corpus may limit the power of the language model. In this article, we propose a multi-stage training strategy that gradually increases the training vocabulary by modifying the training data. Specifically, we leverage the syntactic structure as a bridge for infrequent words and replace them with the corresponding syntactic labels, then we recover their original lexical surface for further training. Such strategy results in an easy-to-hard curriculum learning process, where the model learns the most common words and some basic syntax concepts, before recognizing a large number of uncommon words via their specific usages and the previously learned category knowledge. Experimental results show that such a method can improve the performance of both discriminative and generative pre-trained language models on benchmarks and various downstream tasks.

引用

页码：486 / 496

页数：11

共 9 条

[1] Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks
Wang, Siyuan
Wei, Zhongyu
Xu, Jiarong
Li, Taishan
Fan, Zhihao
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1586 - 1595
[2] Curriculum pre-training for stylized neural machine translation
Zou, Aixiao
Wu, Xuanxuan
Li, Xinjie
Zhang, Ting
Cui, Fuwei
Xu, Jinan
APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7958 - 7968
[3] Schema dependency-enhanced curriculum pre-training for table semantic parsing
Qin, Bowen
Hui, Binyuan
Wang, Lihan
Yang, Min
Li, Binhua
Huang, Fei
Si, Luo
Jiang, Qingshan
Li, Yongbin
KNOWLEDGE-BASED SYSTEMS, 2023, 262
[4] Neural Networks for Sequential Data: a Pre-training Approach based on Hidden Markov Models
Pasa, Luca
Testolin, Alberto
Sperduti, Alessandro
NEUROCOMPUTING, 2015, 169 : 323 - 333
[5] Curriculumformer: Taming Curriculum Pre-Training for Enhanced 3-D Point Cloud Understanding
Fei, Ben
Luo, Tianyue
Yang, Weidong
Liu, Liwen
Zhang, Rui
He, Ying
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
[6] Learning From Incorrectness: Active Learning With Negative Pre-Training and Curriculum Querying for Histological Tissue Classification
Hu, Wentao
Cheng, Lianglun
Huang, Guoheng
Yuan, Xiaochen
Zhong, Guo
Pun, Chi-Man
Zhou, Jian
Cai, Muyan
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 625 - 637
[7] Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition
Zhang, Zhilong
Wang, Wei
Qian, Yanmin
INTERSPEECH 2023, 2023, : 2248 - 2252
[8] Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Zhang, Bowen
Cao, Songjun
Zhang, Xiaoming
Zhang, Yike
Ma, Long
Shinozaki, Takahiro
INTERSPEECH 2022, 2022, : 2653 - 2657
[9] Curriculum Based Discriminative Language Model Training
Dikici, Erinc
Saraclar, Murat
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,

← 1 →