TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning

被引:0
|
作者
Wang, Keheng [1 ]
Yin, Chuantao [1 ,3 ]
Li, Rumei [2 ]
Wang, Sirui [2 ]
Xian, Yunsen [2 ]
Rong, Wenge [4 ]
Xiong, Zhang [4 ]
机构
[1] Beihang Univ, Sino French Engineer Sch, Beijing 100191, Peoples R China
[2] Meituan Inc, Beijing 100102, Peoples R China
[3] Beihang Hangzhou Innovat Inst Yuhang, Hangzhou 311100, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Natural language processing; Natural language understanding; Contrastive learning; GLUE;
D O I
10.1007/s10994-023-06512-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.
引用
收藏
页码:3999 / 4012
页数:14
相关论文
共 50 条
  • [1] ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning
    Qin, Yujia
    Lin, Yankai
    Takanobu, Ryuichi
    Liu, Zhiyuan
    Li, Peng
    Ji, Heng
    Huang, Minlie
    Sun, Maosong
    Zhou, Jie
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3350 - 3363
  • [2] Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning
    Saha, Swarnadeep
    Yadav, Prateek
    Bansal, Mohit
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1190 - 1208
  • [3] Injecting Wiktionary to improve token-level contextual representations using contrastive learning
    Mosolova, Anna
    Candito, Marie
    Ramisch, Carlos
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 34 - 41
  • [4] ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning
    Liu, Shangqing
    Wu, Bozhi
    Xie, Xiaofei
    Meng, Guozhu
    Liu, Yang
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2476 - 2487
  • [5] Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
    Nguyen, Hoang H.
    Zhang, Chenwei
    Liu, Ye
    Yu, Philip S.
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 470 - 481
  • [6] A Radical-Based Token Representation Method for Enhancing Chinese Pre-Trained Language Models
    Qin, Honglun
    Li, Meiwen
    Wang, Lin
    Ge, Youming
    Zhu, Junlong
    Zheng, Ruijuan
    ELECTRONICS, 2025, 14 (05):
  • [7] Focused Contrastive Loss for Classification With Pre-Trained Language Models
    He, Jiayuan
    Li, Yuan
    Zhai, Zenan
    Fang, Biaoyan
    Thorne, Camilo
    Druckenbrodt, Christian
    Akhondi, Saber
    Verspoor, Karin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 3047 - 3061
  • [8] Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
    Tan, Yue
    Long, Guodong
    Ma, Jie
    Liu, Lu
    Zhou, Tianyi
    Jiang, Jing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning
    Zhang, Xu
    Wan, Xiaojun
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 190 - 202
  • [10] Representation Transfer Learning via Multiple Pre-Trained Models for Linear Regression
    Singh, Navjot
    Diggavi, Suhas
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2025, 19 (01) : 208 - 220