An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引:0
|
作者
Mehta, Sanket Vaibhav [1 ]
Patil, Darshan [2 ]
Chandar, Sarath [3 ]
Strubell, Emma [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada
[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.
引用
收藏
页数:50
相关论文
共 50 条
  • [1] The role of pre-training interventions in learning: A meta-analysis and integrative review
    Mesmer-Magnus, Jessica
    Viswesvaran, Chockalingam
    HUMAN RESOURCE MANAGEMENT REVIEW, 2010, 20 (04) : 261 - 282
  • [2] Channelling employability perceptions through lifelong learning: an empirical investigation
    Nimmi, P. M.
    Zakkariya, K. A.
    Rahul, P. R.
    EDUCATION AND TRAINING, 2021, 63 (05): : 763 - 776
  • [3] Contrastive Learning With Enhancing Detailed Information for Pre-Training Vision Transformer
    Liang, Zhuomin
    Bai, Liang
    Fan, Jinyu
    Yang, Xian
    Liang, Jiye
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 219 - 231
  • [4] Continual pre-training mitigates forgetting in language and vision
    Cossu, Andrea
    Carta, Antonio
    Passaro, Lucia
    Lomonaco, Vincenzo
    Tuytelaars, Tinne
    Bacciu, Davide
    NEURAL NETWORKS, 2024, 179
  • [5] Omni-Training: Bridging Pre-Training and Meta-Training for Few-Shot Learning
    Shu, Yang
    Cao, Zhangjie
    Gao, Jinghan
    Wang, Jianmin
    Yu, Philip S.
    Long, Mingsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15275 - 15291
  • [6] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [7] Lifelong Learning - Key Role for Education and Training. Initiatives in the Romanian Higher Education
    Prepelita-Raileanu, Brandusa
    LATEST TRENDS ON ENGINEERING EDUCATION, 2010, : 104 - +
  • [8] Learning From Incorrectness: Active Learning With Negative Pre-Training and Curriculum Querying for Histological Tissue Classification
    Hu, Wentao
    Cheng, Lianglun
    Huang, Guoheng
    Yuan, Xiaochen
    Zhong, Guo
    Pun, Chi-Man
    Zhou, Jian
    Cai, Muyan
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 625 - 637
  • [9] An Online Reinforcement Learning Method for Multi-Zone Ventilation Control With Pre-Training
    Cui, Can
    Li, Chunxiao
    Li, Ming
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (07) : 7163 - 7172
  • [10] Privacy-Preserving Split Learning for Large-Scaled Vision Pre-Training
    Wang, Zhousheng
    Yang, Geng
    Dai, Hua
    Rong, Chunming
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1539 - 1553