An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引:0
作者
Mehta, Sanket Vaibhav [1 ]
Patil, Darshan [2 ]
Chandar, Sarath [3 ]
Strubell, Emma [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada
[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.
引用
收藏
页数:50
相关论文
共 50 条
  • [41] Code Smell Detection Research Based on Pre-training and Stacking Models
    Zhang, Dongwen
    Song, Shuai
    Zhang, Yang
    Liu, Haiyang
    Shen, Gaojie
    IEEE LATIN AMERICA TRANSACTIONS, 2024, 22 (01) : 22 - 30
  • [42] A Simple Yet Effective Layered Loss for Pre-Training of Network Embedding
    Chen, Junyang
    Li, Xueliang
    Li, Yuanman
    Li, Paul
    Wang, Mengzhu
    Zhang, Xiang
    Gong, Zhiguo
    Wu, Kaishun
    Leung, Victor C. M.
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (03): : 1827 - 1837
  • [43] THE ROLE OF LIFELONG LEARNING IN THE DEVELOPMENT OF ENTREPRENEURSHIP IN CROATIA
    Malesevic, Zrinka
    Lerga, Lucija
    10TH INTERNATIONAL SCIENTIFIC SYMPOSIUM REGION ENTREPRENEURSHIP DEVELOPMENT (RED 2021), 2021, : 453 - 464
  • [44] Lifelong learning: what role for e-learning 2.0?
    Calvani, Antonio
    Bonaiuti, Giovanni
    Fini, Antonio
    JOURNAL OF E-LEARNING AND KNOWLEDGE SOCIETY, 2008, 4 (01): : 179 - 187
  • [45] A New Role of Lifelong Learning Support System
    Zhou, Wei
    Yasuda, Takami
    Yokoi, Shigeki
    TOWARDS SUSTAINABLE AND SCALABLE EDUCATIONAL INNOVATIONS INFORMED BY LEARNING SCIENCES, 2005, 133 : 954 - 957
  • [46] Supporting lifelong learning: The role of the public library
    Liu, CH
    CITY DEVELOPMENT AND LIBRARY SERVICES, 2004, : 24 - 29
  • [47] The role of SEE University in lifelong learning in Macedonia
    Abdullai, Jonuz
    Ramadani, Kujtim
    Tresi, Afrim
    Ademi, Arben
    5TH WORLD CONFERENCE ON EDUCATIONAL SCIENCES, 2014, 116 : 3106 - 3109
  • [48] Cultivating Lifelong Learning: Pre-Service Teachers and their MOOCs
    Batchelor, Jacqueline
    Lautenbach, Geoffrey
    2015 IST-AFRICA CONFERENCE, 2015,
  • [49] THE ROLE OF INFORMATION LITERACY IN OVERCOMING OBSTACLES TO LEARNING AND LIFELONG LEARNING
    Rahanu, Harjinder
    Khan, Nawaz
    Georgiadou, Elli
    Siakas, Kerstin
    EDULEARN15: 7TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES, 2015, : 1184 - 1194
  • [50] MuralDiff: Diffusion for Ancient Murals Restoration on Large-Scale Pre-Training
    Xu, Zishan
    Zhang, Xiaofeng
    Chen, Wei
    Liu, Jueting
    Xu, Tingting
    Wang, Zehua
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2169 - 2181