Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations

被引:0
作者
Wu, Chenwei [1 ]
Lee, Holden [2 ]
Ge, Rong [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] Johns Hopkins Univ, Baltimore, MD USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, researchers have found that representations learned by large-scale pre-trained language models are useful in various downstream tasks. However, there is little theoretical understanding of how pre-training performance is related to downstream task performance. In this paper, we analyze how this performance transfer depends on the properties of the downstream task and the structure of the representations. We consider a log-linear model where a word can be predicted from its context through a network having softmax as its last layer. We show that even if the downstream task is highly structured and depends on a simple function of the hidden representation, there are still cases when a low pre-training loss cannot guarantee good performance on the downstream task. On the other hand, we propose and empirically validate the existence of an "anchor vector" in the representation space, and show that this assumption, together with properties of the downstream task, guarantees performance transfer.
引用
收藏
页数:23
相关论文
共 50 条
[31]   Controllable Generation from Pre-trained Language Models via Inverse Prompting [J].
Zou, Xu ;
Yin, Da ;
Zhong, Qingyang ;
Yang, Hongxia ;
Yang, Zhilin ;
Tang, Jie .
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :2450-2460
[32]   Enhancing Scalability of Pre-trained Language Models via Efficient Parameter Sharing [J].
Liu, Peiyu ;
Gao, Ze-Feng ;
Chen, Yushuo ;
Zhao, Wayne Xin ;
Wen, Ji-Rong .
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, :13771-13785
[33]   From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader [J].
Xu, Weiwen ;
Li, Xin ;
Zhang, Wenxuan ;
Zhou, Meng ;
Lam, Wai ;
Si, Luo ;
Bing, Lidong .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[34]   Pre-trained models for natural language processing: A survey [J].
Qiu XiPeng ;
Sun TianXiang ;
Xu YiGe ;
Shao YunFan ;
Dai Ning ;
Huang XuanJing .
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) :1872-1897
[35]   Probing Pre-Trained Language Models for Disease Knowledge [J].
Alghanmi, Israa ;
Espinosa-Anke, Luis ;
Schockaert, Steven .
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, :3023-3033
[36]   Emotional Paraphrasing Using Pre-trained Language Models [J].
Casas, Jacky ;
Torche, Samuel ;
Daher, Karl ;
Mugellini, Elena ;
Abou Khaled, Omar .
2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
[37]   Analyzing Individual Neurons in Pre-trained Language Models [J].
Durrani, Nadir ;
Sajjad, Hassan ;
Dalvi, Fahim ;
Belinkov, Yonatan .
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, :4865-4880
[38]   Supporting Undotted Arabic with Pre-trained Language Models [J].
Rom, Aviad ;
Bar, Kfir .
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING, ICNLSP 2021, 2021, :89-94
[39]   A Close Look into the Calibration of Pre-trained Language Models [J].
Chen, Yangyi ;
Yuan, Lifan ;
Cui, Ganqu ;
Liu, Zhiyuan ;
Ji, Heng .
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, :1343-1367
[40]   Deep Entity Matching with Pre-Trained Language Models [J].
Li, Yuliang ;
Li, Jinfeng ;
Suhara, Yoshihiko ;
Doan, AnHai ;
Tan, Wang-Chiew .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01) :50-60