Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs

被引:0
|
作者
You, Kaichao [1 ]
Liu, Yong [1 ]
Zhang, Ziyang [2 ]
Wang, Jianmin [1 ]
Jordan, Michael I. [3 ]
Long, Mingsheng [1 ]
机构
[1] School of Software, BNRist, Tsinghua University, Beijing,100084, China
[2] Advanced Computing and Storage Lab, Huawei Technologies Co. Ltd
[3] Division of Computer Science, Department of Statistics, UC Berkeley, CA,94720-1776, United States
基金
中国国家自然科学基金;
关键词
Deep learning - Learning systems;
D O I
暂无
中图分类号
TB18 [人体工程学]; Q98 [人类学];
学科分类号
030303 ; 1201 ;
摘要
Model hubs with many pre-trained models (PTMs) have become a cornerstone of deep learning. Although built at a high cost, they remain under-exploited—practitioners usually pick one PTM from the provided model hub by popularity and then fine-tune the PTM to solve the target task. This naïve but common practice poses two obstacles to full exploitation of pre-trained model hubs: first, the PTM selection by popularity has no optimality guarantee, and second, only one PTM is used while the remaining PTMs are ignored. An alternative might be to consider all possible combinations of PTMs and extensively fine-tune each combination, but this would not only be prohibitive computationally but may also lead to statistical over-fitting. In this paper, we propose a new paradigm for exploiting model hubs that is intermediate between these extremes. The paradigm is characterized by two aspects: (1) We use an evidence maximization procedure to estimate the maximum value of label evidence given features extracted by pre-trained models. This procedure can rank all the PTMs in a model hub for various types of PTMs and tasks before fine-tuning. (2) The best ranked PTM can either be fine-tuned and deployed if we have no preference for the model’s architecture or the target PTM can be tuned by the top K ranked PTMs via a Bayesian procedure that we propose. This procedure, which we refer to as B-Tuning, not only improves upon specialized methods designed for tuning homogeneous PTMs, but also applies to the challenging problem of tuning heterogeneous PTMs where it yields a new level of benchmark performance. ©2022 Kaichao You, Yong Liu, Ziyang Zhang, Jianmin Wang, Michael I. Jordan, Mingsheng Long.
引用
收藏
相关论文
共 50 条
  • [1] Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs
    You, Kaichao
    Liu, Yong
    Zhang, Ziyang
    Wang, Jianmin
    Jordan, Michael I.
    Long, Mingsheng
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [2] Exploiting Syntactic Information to Boost the Fine-tuning of Pre-trained Models
    Liu, Chaoming
    Zhu, Wenhao
    Zhang, Xiaoyu
    Zhai, Qiuhong
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 575 - 582
  • [3] Prompt Tuning for Discriminative Pre-trained Language Models
    Yao, Yuan
    Dong, Bowen
    Zhang, Ao
    Zhang, Zhengyan
    Xie, Ruobing
    Liu, Zhiyuan
    Lin, Leyu
    Sun, Maosong
    Wang, Jianyong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3468 - 3473
  • [4] MaskDiffusion: Exploiting Pre-Trained Diffusion Models for Semantic Segmentation
    Kawano, Yasufumi
    Aoki, Yoshimitsu
    IEEE ACCESS, 2024, 12 : 127283 - 127293
  • [5] Tuning Pre-trained Model via Moment Probing
    Gao, Mingze
    Wang, Qilong
    Lin, Zhenyi
    Zhu, Pengfei
    Hu, Qinghua
    Zhou, Jingbo
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11769 - 11779
  • [6] Pre-trained Language Model based Ranking in Baidu Search
    Zou, Lixin
    Zhang, Shengqiang
    Cai, Hengyi
    Ma, Dehong
    Cheng, Suqi
    Wang, Shuaiqiang
    Shi, Daiting
    Cheng, Zhicong
    Yin, Dawei
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4014 - 4022
  • [7] Span Fine-tuning for Pre-trained Language Models
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
  • [8] A new method for tuning the CNN pre-trained models as a feature extractor for malware detection
    Bakir, Halit
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (01)
  • [9] XDAI: A Tuning-free Framework for Exploiting Pre-trained Language Models in Knowledge Grounded Dialogue Generation
    Yu, Jifan
    Zhang, Xiaohan
    Xu, Yifan
    Lei, Xuanyu
    Guan, Xinyu
    Zhang, Jing
    Hou, Lei
    Li, Juanzi
    Tang, Jie
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4422 - 4432
  • [10] y-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning
    Liu, Yitao
    An, Chenxin
    Qiu, Xipeng
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)