TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning

被引：0

作者：

Wang, Keheng ^{[1
]}

Yin, Chuantao ^{[1
,3
]}

Li, Rumei ^{[2
]}

Wang, Sirui ^{[2
]}

Xian, Yunsen ^{[2
]}

Rong, Wenge ^{[4
]}

Xiong, Zhang ^{[4
]}

机构：

[1] Beihang Univ, Sino French Engineer Sch, Beijing 100191, Peoples R China

[2] Meituan Inc, Beijing 100102, Peoples R China

[3] Beihang Hangzhou Innovat Inst Yuhang, Hangzhou 311100, Peoples R China

[4] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China

来源：

MACHINE LEARNING | 2024年 / 113卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Natural language processing; Natural language understanding; Contrastive learning; GLUE;

D O I：

10.1007/s10994-023-06512-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.

引用

页码：3999 / 4012

页数：14

共 50 条

[21] Improving Quality Estimation of Machine Translation by Using Pre-trained Language Representation
Miao, Guoyi
Di, Hui
Xu, Jinan
Yang, Zhongcheng
Chen, Yufeng
Ouchi, Kazushige
MACHINE TRANSLATION, CCMT 2019, 2019, 1104 : 11 - 22
[22] Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models
Zhang, Zijian
Zhao, Zhou
Lin, Zhijie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[23] TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models
Davody, Ali
Adelani, David Ifeoluwa
Kleinbauer, Thomas
Klakow, Dietrich
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 138 - 150
[24] CokeBERT: Contextual knowledge selection and embedding towards enhanced pre-trained language models
Su, Yusheng
Han, Xu
Zhang, Zhengyan
Lin, Yankai
Li, Peng
Liu, Zhiyuan
Zhou, Jie
Sun, Maosong
AI OPEN, 2021, 2 : 127 - 134
[25] Knowledge Base Grounded Pre-trained Language Models via Distillation
Sourty, Raphael
Moreno, Jose G.
Servant, Francois-Paul
Tamine, Lynda
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1617 - 1625
[26] Mitigating social biases of pre-trained language models via contrastive self-debiasing with double data augmentation
Li, Yingji
Du, Mengnan
Song, Rui
Wang, Xin
Sun, Mingchen
Wang, Ying
ARTIFICIAL INTELLIGENCE, 2024, 332
[27] Improving the Reusability of Pre-trained Language Models in Real-world Applications
Ghanbarzadeh, Somayeh
Palangi, Hamid
Huang, Yan
Moreno, Radames Cruz
Khanpour, Hamed
2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 40 - 45
[28] Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting
Fatemi, Zahra
Xing, Chen
Liu, Wenhao
Xiong, Caiming
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1249 - 1262
[29] Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models
Meng, Zaiqiao
Liu, Fangyu
Shareghi, Ehsan
Su, Yixuan
Collins, Charlotte
Collier, Nigel
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4798 - 4810
[30] Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Fadeeva, Ekaterina
Rubashevskii, Aleksandr
Shelmanov, Artem
Petrakov, Sergey
Li, Haonan
Mubarak, Hamdy
Tsymbalov, Evgenii
Kuzmin, Gleb
Panchenko, Alexander
Baldwin, Timothy
Nakov, Preslav
Panov, Maxim
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9367 - 9385

← 1 2 3 4 5 →