Iterative Network Pruning with Uncertainty Regularization for Lifelong Sentiment Classification

被引:13
作者
Geng, Binzong [1 ,2 ]
Yang, Min [2 ]
Yuan, Fajie [3 ,6 ]
Wang, Shupeng [2 ]
Ao, Xiang [4 ]
Xu, Ruifeng [5 ]
机构
[1] Univ Sci & Technol China, Langfang, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing, Peoples R China
[3] Westlake Univ, Hangzhou, Peoples R China
[4] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China
[5] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China
[6] Tencent, Shenzhen, Peoples R China
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
基金
中国国家自然科学基金;
关键词
Lifelong learning; sentiment classification; network pruning; uncertainty regularization;
D O I
10.1145/3404835.3462902
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Lifelong learning capabilities are crucial for sentiment classifiers to process continuous streams of opinioned information on the Web. However, performing lifelong learning is non-trivial for deep neural networks as continually training of incrementally available information inevitably results in catastrophic forgetting or interference. In this paper, we propose a novel iterative network pruning with uncertainty regularization method for lifelong sentiment classification (IPRLS), which leverages the principles of network pruning and weight regularization. By performing network pruning with uncertainty regularization in an iterative manner, IPRLS can adapt a single BERT model to work with continuously arriving data from multiple domains while avoiding catastrophic forgetting and interference. Specifically, we leverage an iterative pruning method to remove redundant parameters in large deep networks so that the freed-up space can then be employed to learn new tasks, tackling the catastrophic forgetting problem. Instead of keeping the old-tasks fixed when learning new tasks, we also use an uncertainty regularization based on the Bayesian online learning framework to constrain the update of old tasks weights in BERT, which enables positive backward transfer, i.e. learning new tasks improves performance on past tasks while protecting old knowledge from being lost. In addition, we propose a task-specific low-dimensional residual function in parallel to each layer of BERT, which makes IPRLS less prone to losing the knowledge saved in the base BERT network when learning a new task. Extensive experiments on 16 popular review corpora demonstrate that the proposed IPRLS method significantly outperforms the strong baselines for lifelong sentiment classification. For reproducibility, we submit the code and data at: https://github.com/siat-nlp/IPRLS.
引用
收藏
页码:1229 / 1238
页数:10
相关论文
共 41 条
[1]  
Ahn H, 2019, ADV NEUR IN, V32
[2]   Task-Free Continual Learning [J].
Aljundi, Rahaf ;
Kelchtermans, Klaas ;
Tuytelaars, Tinne .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11246-11255
[3]  
[Anonymous], 2014, C EMPIRICAL METHODS
[4]  
Ba Jimmy Lei, 2016, arXiv, DOI DOI 10.48550/ARXIV.1607.06450
[5]  
Blitzer J., 2007, P 45 ANN M ASS COMP, V45, P440
[6]  
Blitzer John R., 2006, P C EMP METH NAT LAN, P120, DOI [10.3115/1610075.1610094, DOI 10.3115/1610075.1610094]
[7]  
Blundell C, 2015, PR MACH LEARN RES, V37, P1613
[8]  
Chen ZY, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P750
[9]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]  
Du CN, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4019