Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need

被引:5
|
作者
Zhou, Da-Wei [1 ,2 ]
Cai, Zi-Wen [1 ,2 ]
Ye, Han-Jia [1 ,2 ]
Zhan, De-Chuan [1 ,2 ]
Liu, Ziwei [3 ]
机构
[1] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[3] Nanyang Technol Univ, S Lab, Singapore City 639798, Singapore
关键词
Class-incremental learning; Pre-trained models; Continual learning; Catastrophic forgetting; REPRESENTATION;
D O I
10.1007/s11263-024-02218-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones. Traditional CIL models are trained from scratch to continually acquire knowledge as data evolves. Recently, pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL. Contrary to traditional methods, PTMs possess generalizable embeddings, which can be easily transferred for CIL. In this work, we revisit CIL with PTMs and argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring. (1) We first reveal that frozen PTM can already provide generalizable embeddings for CIL. Surprisingly, a simple baseline (SimpleCIL) which continually sets the classifiers of PTM to prototype features can beat state-of-the-art even without training on the downstream task. (2) Due to the distribution gap between pre-trained and downstream datasets, PTM can be further cultivated with adaptivity via model adaptation. We propose AdaPt and mERge (Aper), which aggregates the embeddings of PTM and adapted models for classifier construction. Aper is a general framework that can be orthogonally combined with any parameter-efficient tuning method, which holds the advantages of PTM's generalizability and adapted model's adaptivity. (3) Additionally, considering previous ImageNet-based benchmarks are unsuitable in the era of PTM due to data overlapping, we propose four new benchmarks for assessment, namely ImageNet-A, ObjectNet, OmniBenchmark, and VTAB. Extensive experiments validate the effectiveness of Aper with a unified and concise framework. Code is available at https://github.com/zhoudw-zdw/RevisitingCIL.
引用
收藏
页码:1012 / 1032
页数:21
相关论文
共 20 条
  • [1] Adapt and Refine: A Few-Shot Class-Incremental Learner via Pre-Trained Models
    Qiang, Sunyuan
    Xiong, Zhu
    Liang, Yanyan
    Wan, Jun
    Zhang, Du
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT 1, 2025, 15031 : 431 - 444
  • [2] iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
    Fischer, Tom
    Liu, Yaoyao
    Jesslen, Artur
    Ahmed, Noor
    Kaushik, Prakhar
    Wang, Angtian
    Yuille, Alan L.
    Kortylewski, Adam
    Ilg, Eddy
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 357 - 374
  • [3] Knowledge Representation by Generic Models for Few-Shot Class-Incremental Learning
    Chen, Xiaodong
    Jiang, Weijie
    Huang, Zhiyong
    Su, Jiangwen
    Yu, Yuanlong
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 1237 - 1247
  • [4] Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models
    Miyazawa, Kazuki
    Kyuragi, Yuta
    Nagai, Takayuki
    IEEE ACCESS, 2022, 10 : 29821 - 29833
  • [5] Unleashing the Class-Incremental Learning Potential of Foundation Models by Virtual Feature Generation and Replay
    Xun, Tianci
    Zheng, Zhong
    He, Yulin
    Chen, Wei
    Zheng, Weiwei
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 453 - 467
  • [6] TARGET SPEECH EXTRACTION WITH PRE-TRAINED SELF-SUPERVISED LEARNING MODELS
    Peng, Junyi
    Delcroix, Marc
    Ochiai, Tsubasa
    Plchot, Oldrich
    Araki, Shoko
    Cemocky, Jan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 10421 - 10425
  • [7] Efficient utilization of pre-trained models: A review of sentiment analysis via prompt learning
    Bu, Kun
    Liu, Yuanchao
    Ju, Xiaolong
    KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [8] DenseNet-201 and Xception Pre-Trained Deep Learning Models for Fruit Recognition
    Salim, Farsana
    Saeed, Faisal
    Basurra, Shadi
    Qasem, Sultan Noman
    Al-Hadhrami, Tawfik
    ELECTRONICS, 2023, 12 (14)
  • [9] Big dermatological data service for precise and immediate diagnosis by utilizing pre-trained learning models
    Elbes, Mohammed
    AlZu'bi, Shadi
    Kanan, Tarek
    Mughaid, Ala
    Abushanab, Samia
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05): : 6931 - 6951
  • [10] On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code
    Weyssow, Martin
    Zhou, Xin
    Kim, Kisub
    Lo, David
    Sahraoui, Houari
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 1470 - 1482