An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引:0
|
作者
Mehta, Sanket Vaibhav [1 ]
Patil, Darshan [2 ]
Chandar, Sarath [3 ]
Strubell, Emma [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada
[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.
引用
收藏
页数:50
相关论文
共 50 条
  • [21] Pre-training interventions to counteract seductive details in virtual reality training programs
    Howard, Matt C.
    Lee, Juseob
    HUMAN RESOURCE DEVELOPMENT QUARTERLY, 2020, 31 (01) : 13 - 29
  • [22] Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design Paradigm
    Lyu, Zhonghao
    Li, Yuchen
    Zhu, Guangxu
    Xu, Jie
    Poor, H. Vincent
    Cui, Shuguang
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (02) : 1584 - 1601
  • [23] The Role of the Lifelong Learning in Logistics 4.0
    Wrobel-Lachowska, Magdalena
    Wisniewski, Zbigniew
    Polak-Sopinska, Aleksandra
    ADVANCES IN HUMAN FACTORS IN TRAINING, EDUCATION, AND LEARNING SCIENCES, AHFE 2017, 2018, 596 : 402 - 409
  • [24] Lifelong Learning in Psychiatry and the Role of Certification
    Anzia, Joan M.
    PSYCHIATRIC CLINICS OF NORTH AMERICA, 2021, 44 (02) : 309 - 316
  • [25] Pre-Training, Transfer Learning and Pretext Learning for a Convolutional Neural Network Applied to Automated Assessment of Clinical PET Image Quality
    Hopson, Jessica B.
    Neji, Radhouene
    Dunn, Joel T.
    McGinnity, Colm J.
    Flaus, Anthime
    Reader, Andrew J.
    Hammers, Alexander
    IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES, 2023, 7 (04) : 372 - 381
  • [26] CPR-CLIP: Multimodal Pre-Training for Composite Error Recognition in CPR Training
    Wang, Shunli
    Yang, Dingkang
    Zhai, Peng
    Zhang, Lihua
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 211 - 215
  • [27] Learning Depth Representation From RGB-D Videos by Time-Aware Contrastive Pre-Training
    He, Zongtao
    Wang, Liuyi
    Dang, Ronghao
    Li, Shu
    Yan, Qingqing
    Liu, Chengju
    Chen, Qijun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4143 - 4158
  • [28] The changing role of Librarians in lifelong learning and distance learning
    Malan, Daniel Jacobus Roelof
    IMSCI '08: 2ND INTERNATIONAL MULTI-CONFERENCE ON SOCIETY, CYBERNETICS AND INFORMATICS, VOL IV, PROCEEDINGS, POST CONFERENCE ISSUE, 2008, : 66 - 70
  • [29] DeepVulSeeker: A novel vulnerability identification framework via code graph structure and pre-training mechanism
    Wang, Jin
    Xiao, Hui
    Zhong, Shuwen
    Xiao, Yinhao
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 148 (15-26): : 15 - 26
  • [30] Ensuring the continuum of learning: The role of assessment for lifelong learning
    Su, Yahui
    INTERNATIONAL REVIEW OF EDUCATION, 2015, 61 (01) : 7 - 20