Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations

被引:0
作者
Wu, Chenwei [1 ]
Lee, Holden [2 ]
Ge, Rong [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] Johns Hopkins Univ, Baltimore, MD USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, researchers have found that representations learned by large-scale pre-trained language models are useful in various downstream tasks. However, there is little theoretical understanding of how pre-training performance is related to downstream task performance. In this paper, we analyze how this performance transfer depends on the properties of the downstream task and the structure of the representations. We consider a log-linear model where a word can be predicted from its context through a network having softmax as its last layer. We show that even if the downstream task is highly structured and depends on a simple function of the hidden representation, there are still cases when a low pre-training loss cannot guarantee good performance on the downstream task. On the other hand, we propose and empirically validate the existence of an "anchor vector" in the representation space, and show that this assumption, together with properties of the downstream task, guarantees performance transfer.
引用
收藏
页数:23
相关论文
共 50 条
[21]   Evaluating Commonsense in Pre-Trained Language Models [J].
Zhou, Xuhui ;
Zhang, Yue ;
Cui, Leyang ;
Huang, Dandan .
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 :9733-9740
[22]   HinPLMs: Pre-trained Language Models for Hindi [J].
Huang, Xixuan ;
Lin, Nankai ;
Li, Kexin ;
Wang, Lianxi ;
Gan, Suifu .
2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, :241-246
[23]   Knowledge Inheritance for Pre-trained Language Models [J].
Qin, Yujia ;
Lin, Yankai ;
Yi, Jing ;
Zhang, Jiajie ;
Han, Xu ;
Zhang, Zhengyan ;
Su, Yusheng ;
Liu, Zhiyuan ;
Li, Peng ;
Sun, Maosong ;
Zhou, Jie .
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, :3921-3937
[24]   Code Execution with Pre-trained Language Models [J].
Liu, Chenxiao ;
Lu, Shuai ;
Chen, Weizhu ;
Jiang, Daxin ;
Svyatkovskiy, Alexey ;
Fu, Shengyu ;
Sundaresan, Neel ;
Duan, Nan .
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, :4984-4999
[25]   Probing for Hyperbole in Pre-Trained Language Models [J].
Schneidermann, Nina Skovgaard ;
Hershcovich, Daniel ;
Pedersen, Bolette Sandford .
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, :200-211
[26]   Pre-trained language models in medicine: A survey * [J].
Luo, Xudong ;
Deng, Zhiqi ;
Yang, Binxia ;
Luo, Michael Y. .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
[27]   A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks [J].
Wahba, Yasmen ;
Madhavji, Nazim ;
Steinbacher, John .
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT II, 2023, 13811 :304-313
[28]   Aspect Based Sentiment Analysis by Pre-trained Language Representations [J].
Liang Tianxin ;
Yang Xiaoping ;
Zhou Xibo ;
Wang Bingqian .
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, :1262-1265
[29]   A Study of Pre-trained Language Models in Natural Language Processing [J].
Duan, Jiajia ;
Zhao, Hui ;
Zhou, Qian ;
Qiu, Meikang ;
Liu, Meiqin .
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, :116-121
[30]   Debiasing Pre-Trained Language Models via Efficient Fine-Tuning [J].
Gira, Michael ;
Zhang, Ruisu ;
Lee, Kangwook .
PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, :59-69