TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning

被引:0
|
作者
Wang, Keheng [1 ]
Yin, Chuantao [1 ,3 ]
Li, Rumei [2 ]
Wang, Sirui [2 ]
Xian, Yunsen [2 ]
Rong, Wenge [4 ]
Xiong, Zhang [4 ]
机构
[1] Beihang Univ, Sino French Engineer Sch, Beijing 100191, Peoples R China
[2] Meituan Inc, Beijing 100102, Peoples R China
[3] Beihang Hangzhou Innovat Inst Yuhang, Hangzhou 311100, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Natural language processing; Natural language understanding; Contrastive learning; GLUE;
D O I
10.1007/s10994-023-06512-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.
引用
收藏
页码:3999 / 4012
页数:14
相关论文
共 50 条
  • [21] Improving Quality Estimation of Machine Translation by Using Pre-trained Language Representation
    Miao, Guoyi
    Di, Hui
    Xu, Jinan
    Yang, Zhongcheng
    Chen, Yufeng
    Ouchi, Kazushige
    MACHINE TRANSLATION, CCMT 2019, 2019, 1104 : 11 - 22
  • [22] Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models
    Zhang, Zijian
    Zhao, Zhou
    Lin, Zhijie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [23] TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models
    Davody, Ali
    Adelani, David Ifeoluwa
    Kleinbauer, Thomas
    Klakow, Dietrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 138 - 150
  • [24] CokeBERT: Contextual knowledge selection and embedding towards enhanced pre-trained language models
    Su, Yusheng
    Han, Xu
    Zhang, Zhengyan
    Lin, Yankai
    Li, Peng
    Liu, Zhiyuan
    Zhou, Jie
    Sun, Maosong
    AI OPEN, 2021, 2 : 127 - 134
  • [25] Knowledge Base Grounded Pre-trained Language Models via Distillation
    Sourty, Raphael
    Moreno, Jose G.
    Servant, Francois-Paul
    Tamine, Lynda
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1617 - 1625
  • [26] Mitigating social biases of pre-trained language models via contrastive self-debiasing with double data augmentation
    Li, Yingji
    Du, Mengnan
    Song, Rui
    Wang, Xin
    Sun, Mingchen
    Wang, Ying
    ARTIFICIAL INTELLIGENCE, 2024, 332
  • [27] Improving the Reusability of Pre-trained Language Models in Real-world Applications
    Ghanbarzadeh, Somayeh
    Palangi, Hamid
    Huang, Yan
    Moreno, Radames Cruz
    Khanpour, Hamed
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 40 - 45
  • [28] Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting
    Fatemi, Zahra
    Xing, Chen
    Liu, Wenhao
    Xiong, Caiming
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1249 - 1262
  • [29] Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models
    Meng, Zaiqiao
    Liu, Fangyu
    Shareghi, Ehsan
    Su, Yixuan
    Collins, Charlotte
    Collier, Nigel
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4798 - 4810
  • [30] Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
    Fadeeva, Ekaterina
    Rubashevskii, Aleksandr
    Shelmanov, Artem
    Petrakov, Sergey
    Li, Haonan
    Mubarak, Hamdy
    Tsymbalov, Evgenii
    Kuzmin, Gleb
    Panchenko, Alexander
    Baldwin, Timothy
    Nakov, Preslav
    Panov, Maxim
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9367 - 9385