Universal embedding for pre-trained models and data bench

被引:0
|
作者
Cho, Namkyeong [1 ]
Cho, Taewon [2 ]
Shin, Jaesun [2 ]
Jeon, Eunjoo [2 ]
Lee, Taehee [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Ctr Math Machine Learning & its Applicat CM2LA, Dept Math, Pohang 37673, Gyeongbuk, South Korea
[2] Samsung SDS, 125 Olymp Ro 35 Gil, Seoul 05510, South Korea
基金
新加坡国家研究基金会;
关键词
Transfer learning; Pretrained models; Graph neural networks;
D O I
10.1016/j.neucom.2024.129107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The transformer architecture has shown significant improvements in the performance of various natural language processing (NLP) tasks. One of the great advantages of transformer-based model is that they allow for the addition of an extra layer to a pre-trained model (PTM) and fine-tuning, rather than requiring the development of a separate architecture for each task. This approach has provided great promising performance in NLP tasks. Therefore, selecting an appropriate PTM from the model zoo, such as Hugging Face, becomes a crucial task. Despite the importance of PTM selection, it still requires further investigation. The main challenge in PTM selection for NLP tasks is the lack of a publicly available benchmark to evaluate model performance for each task and dataset. To address this challenge, we introduce the first public data benchmark to evaluate the performance of popular transformer-based models on diverse ranges of NLP tasks. Furthermore, we propose graph representations of transformer-based models with node features that represent the matrix weight on each layer. Empirical results demonstrate that our proposed graph neural network (GNN) model outperforms existing PTM selection methods.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Pre-trained Gaussian Processes for Bayesian Optimization
    Wang, Zi
    Dahl, George E.
    Swersky, Kevin
    Lee, Chansoo
    Nado, Zachary
    Gilmer, Justin
    Snoek, Jasper
    Ghahramani, Zoubin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [22] ImageNet Pre-trained CNNs for JPEG Steganalysis
    Yousfi, Yassine
    Butora, Jan
    Khvedchenya, Eugene
    Fridrich, Jessica
    2020 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2020,
  • [23] Analyzing Fine-Tune Pre-trained Models for Detecting Cucumber Plant Growth
    Hari, Pragya
    Singh, Maheshwari Prasad
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2022, PT II, 2023, 1798 : 510 - 521
  • [24] Artificial intelligence foundation and pre-trained models: Fundamentals, applications, opportunities, and social impacts
    Kolides, Adam
    Nawaz, Alyna
    Rathor, Anshu
    Beeman, Denzel
    Hashmi, Muzammil
    Fatima, Sana
    Berdik, David
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    SIMULATION MODELLING PRACTICE AND THEORY, 2023, 126
  • [25] Multi-Label Conditional Generation From Pre-Trained Models
    Proszewska, Magdalena
    Wolczyk, Maciej
    Zieba, Maciej
    Wielopolski, Patryk
    Maziarka, Lukasz
    Smieja, Marek
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6185 - 6198
  • [26] Efficient Integrated Features Based on Pre-trained Models for Speaker Verification
    Li, Yishuang
    Guan, Wenhao
    Huang, Hukai
    Miao, Shiyu
    Su, Qi
    Li, Lin
    Hong, Qingyang
    INTERSPEECH 2024, 2024, : 2140 - 2144
  • [27] NtNDet: Hardware Trojan detection based on pre-trained language models
    Kuang, Shijie
    Quan, Zhe
    Xie, Guoqi
    Cai, Xiaomin
    Li, Keqin
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 271
  • [28] Automatic Detection of Liver Cancer Using Hybrid Pre-Trained Models
    Othman, Esam
    Mahmoud, Muhammad
    Dhahri, Habib
    Abdulkader, Hatem
    Mahmood, Awais
    Ibrahim, Mina
    SENSORS, 2022, 22 (14)
  • [29] Comparing pre-trained language models for Spanish hate speech detection
    Miriam Plaza-del-Arco, Flor
    Dolores Molina-Gonzalez, M.
    Alfonso Urena-Lopez, L.
    Teresa Martin-Valdivia, M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [30] Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models
    Miyazawa, Kazuki
    Kyuragi, Yuta
    Nagai, Takayuki
    IEEE ACCESS, 2022, 10 : 29821 - 29833