TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning

被引：0

作者：

Wang, Keheng ^{[1
]}

Yin, Chuantao ^{[1
,3
]}

Li, Rumei ^{[2
]}

Wang, Sirui ^{[2
]}

Xian, Yunsen ^{[2
]}

Rong, Wenge ^{[4
]}

Xiong, Zhang ^{[4
]}

机构：

[1] Beihang Univ, Sino French Engineer Sch, Beijing 100191, Peoples R China

[2] Meituan Inc, Beijing 100102, Peoples R China

[3] Beihang Hangzhou Innovat Inst Yuhang, Hangzhou 311100, Peoples R China

[4] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China

来源：

MACHINE LEARNING | 2024年 / 113卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Natural language processing; Natural language understanding; Contrastive learning; GLUE;

D O I：

10.1007/s10994-023-06512-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.

引用

页码：3999 / 4012

页数：14

共 50 条

[1] ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning
Qin, Yujia
Lin, Yankai
Takanobu, Ryuichi
Liu, Zhiyuan
Li, Peng
Ji, Heng
Huang, Minlie
Sun, Maosong
Zhou, Jie
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3350 - 3363
[2] Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning
Saha, Swarnadeep
Yadav, Prateek
Bansal, Mohit
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1190 - 1208
[3] Injecting Wiktionary to improve token-level contextual representations using contrastive learning
Mosolova, Anna
Candito, Marie
Ramisch, Carlos
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 34 - 41
[4] ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning
Liu, Shangqing
Wu, Bozhi
Xie, Xiaofei
Meng, Guozhu
Liu, Yang
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2476 - 2487
[5] Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
Nguyen, Hoang H.
Zhang, Chenwei
Liu, Ye
Yu, Philip S.
24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 470 - 481
[6] A Radical-Based Token Representation Method for Enhancing Chinese Pre-Trained Language Models
Qin, Honglun
Li, Meiwen
Wang, Lin
Ge, Youming
Zhu, Junlong
Zheng, Ruijuan
ELECTRONICS, 2025, 14 (05):
[7] Focused Contrastive Loss for Classification With Pre-Trained Language Models
He, Jiayuan
Li, Yuan
Zhai, Zenan
Fang, Biaoyan
Thorne, Camilo
Druckenbrodt, Christian
Akhondi, Saber
Verspoor, Karin
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 3047 - 3061
[8] Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
Tan, Yue
Long, Guodong
Ma, Jie
Liu, Lu
Zhou, Tianyi
Jiang, Jing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning
Zhang, Xu
Wan, Xiaojun
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 190 - 202
[10] Representation Transfer Learning via Multiple Pre-Trained Models for Linear Regression
Singh, Navjot
Diggavi, Suhas
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2025, 19 (01) : 208 - 220

← 1 2 3 4 5 →