Universal embedding for pre-trained models and data bench

被引:0
作者
Cho, Namkyeong [1 ]
Cho, Taewon [2 ]
Shin, Jaesun [2 ]
Jeon, Eunjoo [2 ]
Lee, Taehee [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Ctr Math Machine Learning & its Applicat CM2LA, Dept Math, Pohang 37673, Gyeongbuk, South Korea
[2] Samsung SDS, 125 Olymp Ro 35 Gil, Seoul 05510, South Korea
基金
新加坡国家研究基金会;
关键词
Transfer learning; Pretrained models; Graph neural networks;
D O I
10.1016/j.neucom.2024.129107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The transformer architecture has shown significant improvements in the performance of various natural language processing (NLP) tasks. One of the great advantages of transformer-based model is that they allow for the addition of an extra layer to a pre-trained model (PTM) and fine-tuning, rather than requiring the development of a separate architecture for each task. This approach has provided great promising performance in NLP tasks. Therefore, selecting an appropriate PTM from the model zoo, such as Hugging Face, becomes a crucial task. Despite the importance of PTM selection, it still requires further investigation. The main challenge in PTM selection for NLP tasks is the lack of a publicly available benchmark to evaluate model performance for each task and dataset. To address this challenge, we introduce the first public data benchmark to evaluate the performance of popular transformer-based models on diverse ranges of NLP tasks. Furthermore, we propose graph representations of transformer-based models with node features that represent the matrix weight on each layer. Empirical results demonstrate that our proposed graph neural network (GNN) model outperforms existing PTM selection methods.
引用
收藏
页数:21
相关论文
共 71 条
[41]   Graph neural architecture search: A survey [J].
Oloulade, Babatounde Moctard ;
Gao, Jianliang ;
Chen, Jiamin ;
Lyu, Tengfei ;
Al-Sabri, Raeed .
TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) :692-708
[42]  
Opitz D., 1999, Journal of Artificial Intelligence Research, V11, P169
[43]  
Paszke A, 2019, ADV NEUR IN, V32
[44]  
Qin Y, 2022, ADV NEUR IN
[45]  
Radford A., 2019, OpenAI blog, V1, P9
[46]  
Raffel C, 2020, J MACH LEARN RES, V21
[47]  
Rajpurkar P, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P784
[48]  
Sanh Victor, 2019, P 33 C NEUR INF PROC
[49]   Distributed Graph Neural Network Training: A Survey [J].
Shao, Yingxia ;
Li, Hongzheng ;
Gu, Xizhi ;
Yin, Hongbo ;
Li, Yawen ;
Miao, Xupeng ;
Zhang, Wentao ;
Cui, Bin ;
Chen, Lei .
ACM COMPUTING SURVEYS, 2024, 56 (08)
[50]   TABLE FOR ESTIMATING THE GOODNESS OF FIT OF EMPIRICAL DISTRIBUTIONS [J].
SMIRNOV, N .
ANNALS OF MATHEMATICAL STATISTICS, 1948, 19 (02) :279-279