Universal embedding for pre-trained models and data bench

被引：0

作者：

Cho, Namkyeong ^{[1
]}

Cho, Taewon ^{[2
]}

Shin, Jaesun ^{[2
]}

Jeon, Eunjoo ^{[2
]}

Lee, Taehee ^{[2
]}

机构：

[1] Pohang Univ Sci & Technol POSTECH, Ctr Math Machine Learning & its Applicat CM2LA, Dept Math, Pohang 37673, Gyeongbuk, South Korea

[2] Samsung SDS, 125 Olymp Ro 35 Gil, Seoul 05510, South Korea

来源：

NEUROCOMPUTING | 2025年 / 619卷

基金：

新加坡国家研究基金会;

关键词：

Transfer learning; Pretrained models; Graph neural networks;

D O I：

10.1016/j.neucom.2024.129107

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The transformer architecture has shown significant improvements in the performance of various natural language processing (NLP) tasks. One of the great advantages of transformer-based model is that they allow for the addition of an extra layer to a pre-trained model (PTM) and fine-tuning, rather than requiring the development of a separate architecture for each task. This approach has provided great promising performance in NLP tasks. Therefore, selecting an appropriate PTM from the model zoo, such as Hugging Face, becomes a crucial task. Despite the importance of PTM selection, it still requires further investigation. The main challenge in PTM selection for NLP tasks is the lack of a publicly available benchmark to evaluate model performance for each task and dataset. To address this challenge, we introduce the first public data benchmark to evaluate the performance of popular transformer-based models on diverse ranges of NLP tasks. Furthermore, we propose graph representations of transformer-based models with node features that represent the matrix weight on each layer. Empirical results demonstrate that our proposed graph neural network (GNN) model outperforms existing PTM selection methods.

引用

页数：21

共 50 条

[21] Pre-trained Gaussian Processes for Bayesian Optimization
Wang, Zi
Dahl, George E.
Swersky, Kevin
Lee, Chansoo
Nado, Zachary
Gilmer, Justin
Snoek, Jasper
Ghahramani, Zoubin
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[22] ImageNet Pre-trained CNNs for JPEG Steganalysis
Yousfi, Yassine
Butora, Jan
Khvedchenya, Eugene
Fridrich, Jessica
2020 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2020,
[23] Analyzing Fine-Tune Pre-trained Models for Detecting Cucumber Plant Growth
Hari, Pragya
Singh, Maheshwari Prasad
ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2022, PT II, 2023, 1798 : 510 - 521
[24] Artificial intelligence foundation and pre-trained models: Fundamentals, applications, opportunities, and social impacts
Kolides, Adam
Nawaz, Alyna
Rathor, Anshu
Beeman, Denzel
Hashmi, Muzammil
Fatima, Sana
Berdik, David
Al-Ayyoub, Mahmoud
Jararweh, Yaser
SIMULATION MODELLING PRACTICE AND THEORY, 2023, 126
[25] Multi-Label Conditional Generation From Pre-Trained Models
Proszewska, Magdalena
Wolczyk, Maciej
Zieba, Maciej
Wielopolski, Patryk
Maziarka, Lukasz
Smieja, Marek
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6185 - 6198
[26] Efficient Integrated Features Based on Pre-trained Models for Speaker Verification
Li, Yishuang
Guan, Wenhao
Huang, Hukai
Miao, Shiyu
Su, Qi
Li, Lin
Hong, Qingyang
INTERSPEECH 2024, 2024, : 2140 - 2144
[27] NtNDet: Hardware Trojan detection based on pre-trained language models
Kuang, Shijie
Quan, Zhe
Xie, Guoqi
Cai, Xiaomin
Li, Keqin
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 271
[28] Automatic Detection of Liver Cancer Using Hybrid Pre-Trained Models
Othman, Esam
Mahmoud, Muhammad
Dhahri, Habib
Abdulkader, Hatem
Mahmood, Awais
Ibrahim, Mina
SENSORS, 2022, 22 (14)
[29] Comparing pre-trained language models for Spanish hate speech detection
Miriam Plaza-del-Arco, Flor
Dolores Molina-Gonzalez, M.
Alfonso Urena-Lopez, L.
Teresa Martin-Valdivia, M.
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
[30] Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models
Miyazawa, Kazuki
Kyuragi, Yuta
Nagai, Takayuki
IEEE ACCESS, 2022, 10 : 29821 - 29833

← 1 2 3 4 5 →