Gradual Syntactic Label Replacement for Language Model Pre-Training

被引:0
|
作者
Wang, Yile [1 ]
Zhang, Yue [2 ]
Li, Peng [1 ]
Liu, Yang [3 ]
机构
[1] Tsinghua Univ, Inst AI Ind Res, Beijing 100084, Peoples R China
[2] Westlake Univ, Sch Engn, Hangzhou 310024, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Language model pre-training; syntactic label replacement; curriculum learning; data-centric;
D O I
10.1109/TASLP.2023.3331096
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Pre-training serves as a foundation of recent NLP models, where language modeling tasks are performed over large texts. Typical models like BERT and GPT take the corpus as a whole and treat each word equally for language modeling. However, recent works show that the naturally existing frequency bias in the raw corpus may limit the power of the language model. In this article, we propose a multi-stage training strategy that gradually increases the training vocabulary by modifying the training data. Specifically, we leverage the syntactic structure as a bridge for infrequent words and replace them with the corresponding syntactic labels, then we recover their original lexical surface for further training. Such strategy results in an easy-to-hard curriculum learning process, where the model learns the most common words and some basic syntax concepts, before recognizing a large number of uncommon words via their specific usages and the previously learned category knowledge. Experimental results show that such a method can improve the performance of both discriminative and generative pre-trained language models on benchmarks and various downstream tasks.
引用
收藏
页码:486 / 496
页数:11
相关论文
共 9 条
  • [1] Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks
    Wang, Siyuan
    Wei, Zhongyu
    Xu, Jiarong
    Li, Taishan
    Fan, Zhihao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1586 - 1595
  • [2] Curriculum pre-training for stylized neural machine translation
    Zou, Aixiao
    Wu, Xuanxuan
    Li, Xinjie
    Zhang, Ting
    Cui, Fuwei
    Xu, Jinan
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7958 - 7968
  • [3] Schema dependency-enhanced curriculum pre-training for table semantic parsing
    Qin, Bowen
    Hui, Binyuan
    Wang, Lihan
    Yang, Min
    Li, Binhua
    Huang, Fei
    Si, Luo
    Jiang, Qingshan
    Li, Yongbin
    KNOWLEDGE-BASED SYSTEMS, 2023, 262
  • [4] Neural Networks for Sequential Data: a Pre-training Approach based on Hidden Markov Models
    Pasa, Luca
    Testolin, Alberto
    Sperduti, Alessandro
    NEUROCOMPUTING, 2015, 169 : 323 - 333
  • [5] Curriculumformer: Taming Curriculum Pre-Training for Enhanced 3-D Point Cloud Understanding
    Fei, Ben
    Luo, Tianyue
    Yang, Weidong
    Liu, Liwen
    Zhang, Rui
    He, Ying
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [6] Learning From Incorrectness: Active Learning With Negative Pre-Training and Curriculum Querying for Histological Tissue Classification
    Hu, Wentao
    Cheng, Lianglun
    Huang, Guoheng
    Yuan, Xiaochen
    Zhong, Guo
    Pun, Chi-Man
    Zhou, Jian
    Cai, Muyan
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 625 - 637
  • [7] Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition
    Zhang, Zhilong
    Wang, Wei
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 2248 - 2252
  • [8] Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
    Zhang, Bowen
    Cao, Songjun
    Zhang, Xiaoming
    Zhang, Yike
    Ma, Long
    Shinozaki, Takahiro
    INTERSPEECH 2022, 2022, : 2653 - 2657
  • [9] Curriculum Based Discriminative Language Model Training
    Dikici, Erinc
    Saraclar, Murat
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,