Universal embedding for pre-trained models and data bench

被引:0
作者
Cho, Namkyeong [1 ]
Cho, Taewon [2 ]
Shin, Jaesun [2 ]
Jeon, Eunjoo [2 ]
Lee, Taehee [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Ctr Math Machine Learning & its Applicat CM2LA, Dept Math, Pohang 37673, Gyeongbuk, South Korea
[2] Samsung SDS, 125 Olymp Ro 35 Gil, Seoul 05510, South Korea
基金
新加坡国家研究基金会;
关键词
Transfer learning; Pretrained models; Graph neural networks;
D O I
10.1016/j.neucom.2024.129107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The transformer architecture has shown significant improvements in the performance of various natural language processing (NLP) tasks. One of the great advantages of transformer-based model is that they allow for the addition of an extra layer to a pre-trained model (PTM) and fine-tuning, rather than requiring the development of a separate architecture for each task. This approach has provided great promising performance in NLP tasks. Therefore, selecting an appropriate PTM from the model zoo, such as Hugging Face, becomes a crucial task. Despite the importance of PTM selection, it still requires further investigation. The main challenge in PTM selection for NLP tasks is the lack of a publicly available benchmark to evaluate model performance for each task and dataset. To address this challenge, we introduce the first public data benchmark to evaluate the performance of popular transformer-based models on diverse ranges of NLP tasks. Furthermore, we propose graph representations of transformer-based models with node features that represent the matrix weight on each layer. Empirical results demonstrate that our proposed graph neural network (GNN) model outperforms existing PTM selection methods.
引用
收藏
页数:21
相关论文
共 71 条
[1]  
Abadi Martin, 2016, arXiv
[2]  
Aghajanyan Armen, 2021, P 59 ANN M ASS COMP, V1, P7319
[3]  
An k., 1933, Giorn Dell'inst Ital Degli Att, V4, P89
[4]   Binary Graph Neural Networks [J].
Bahri, Mehdi ;
Bahl, Gaetan ;
Zafeiriou, Stefanos .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9487-9496
[5]  
Banerjee PK, 2022, ANN ALLERTON CONF, DOI [10.1109/Allerton49937.2022.9929363, 10.1109/ALLERTON49937.2022.9929363]
[6]  
Brown TB, 2020, ADV NEUR IN, V33
[7]   Rethinking Graph Neural Architecture Search from Message-passing [J].
Cai, Shaofei ;
Li, Liang ;
Deng, Jincan ;
Zhang, Beichen ;
Zha, Zheng-Jun ;
Su, Li ;
Huang, Qingming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6653-6662
[8]  
Clark K., 2020, ELECTRA PRETRAINING, DOI [DOI 10.48550/ARXIV.2003.10555, 10.48550/arXiv.2003.10555]
[9]  
Conneau A, 2020, P 58 ANN M ASS COMP, DOI DOI 10.18653/V1/2020.ACL-MAIN.747
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171