Fine-Tuning of Distil-BERT for Continual Learning in Text Classification: An Experimental Analysis

被引:0
|
作者
Shah, Sahar [1 ]
Manzoni, Sara Lucia [1 ]
Zaman, Farooq [2 ]
Es Sabery, Fatima [3 ]
Epifania, Francesco [4 ]
Zoppis, Italo Francesco [1 ]
机构
[1] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy
[2] Informat Technol Univ, Dept Comp Sci, Lahore, Pakistan
[3] Hassan II Univ, Lab Econ & Logist Performance, Fac Law Econ & Social Sci Mohammedia, Casablanca, Morocco
[4] Social Things srl, Milan, Italy
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Continual learning; natural language processing; text classification; fine-tuning; Distil-BERT;
D O I
10.1109/ACCESS.2024.3435537
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Continual learning (CL) with bidirectional encoder representation from transformer (BERT) and its variant Distil-BERT, have shown remarkable performance in various natural language processing (NLP) tasks, such as text classification (TC). However, the model degrading factors like catastrophic forgetting (CF), accuracy, task dependent architecture ruined its popularity for complex and intelligent tasks. This research article proposes an innovative approach to address the challenges of CL in TC tasks. The objectives are to enable the model to learn continuously without forgetting previously acquired knowledge and perfectly avoid CF. To achieve this, a task-independent model architecture is introduced, allowing training of multiple tasks on the same model, thereby improving overall performance in CL scenarios. The framework incorporates two auxiliary tasks, namely next sentence prediction and task identifier prediction, to capture both the task-generic and task-specific contextual information. The Distil-BERT model, enhanced with two linear layers, categorizes the output representation into a task-generic space and a task-specific space. The proposed methodology is evaluated on diverse sets of TC tasks, including Yahoo, Yelp, Amazon, DB-Pedia, and AG-News. The experimental results demonstrate impressive performance across multiple tasks in terms of F1 score, model accuracy, model evaluation loss, learning rate, and training loss of the model. For the Yahoo task, the proposed model achieved an F1 score of 96.84 %, accuracy of 95.85 %, evaluation loss of 0.06, learning rate of 0.00003144. In the Yelp task, our model achieved an F1 score of 96.66 %, accuracy of 97.66 %, evaluation loss of 0.06, and similarly minimized training losses by achieving the learning rate of 0.00003189. For the Amazon task, the F1 score was 95.82 %, the observed accuracy is 97.83 %, evaluation loss was 0.06, and training losses were effectively minimized by securing the learning rate of 0.00003144. In the DB-Pedia task, we achieved an F1 score of 96.20 %, accuracy of 95.21 %, evaluation loss of 0.08, with learning rate 0.0001972 and rapidly minimized training losses due to the limited number of epochs and instances. In the AG-News task, our model obtained an F1 score of 94.78 %, accuracy of 92.76 %, evaluation loss of 0.06, and fixed the learning rate to 0.0001511. These results highlight the exceptional performance of our model in various TC tasks, with gradual reduction in training losses over time, indicating effective learning and retention of knowledge.
引用
收藏
页码:104964 / 104982
页数:19
相关论文
共 50 条
  • [41] Replay to Remember: Continual Layer-Specific Fine-Tuning for German Speech Recognition
    Rosin, Theresa Pekarek
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 489 - 500
  • [42] Chinese Medical Named Entity Recognition based on Expert Knowledge and Fine-tuning Bert
    Zhang, Bofeng
    Yao, Xiuhong
    Li, Haiyan
    Aini, Mirensha
    2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 84 - 90
  • [43] KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION
    Sun, Hao
    Tan, Xu
    Gan, Jun-Wei
    Zhao, Sheng
    Han, Dongxu
    Liu, Hongzhi
    Qin, Tao
    Liu, Tie-Yan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 168 - 175
  • [44] Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
    Yi-Ge Xu
    Xi-Peng Qiu
    Li-Gao Zhou
    Xuan-Jing Huang
    Journal of Computer Science and Technology, 2023, 38 : 853 - 866
  • [45] Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
    Xu, Yi-Ge
    Qiu, Xi-Peng
    Zhou, Li-Gao
    Huang, Xuan-Jing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (04) : 853 - 866
  • [46] MediBioDeBERTa: Biomedical Language Model With Continuous Learning and Intermediate Fine-Tuning
    Kim, Eunhui
    Jeong, Yuna
    Choi, Myung-Seok
    IEEE ACCESS, 2023, 11 : 141036 - 141044
  • [47] A Two Step Fine-tuning Approach for Text Recognition on Identity Documents
    Visalli, Francesco
    Patrizio, Antonio
    Ruffolo, Massimo
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 837 - 844
  • [48] Bridging pre-trained models to continual learning: A hypernetwork based framework with parameter-efficient fine-tuning techniques
    Ding, Fengqian
    Xu, Chen
    Liu, Han
    Zhou, Bin
    Zhou, Hongchao
    INFORMATION SCIENCES, 2024, 674
  • [49] Transfer Learning Vs. Fine-Tuning in Bilinear CNN for Lung Nodules Classification on CT Scans
    Mastouri, Rekka
    Khlifa, Nawres
    Neji, Henda
    Hantous-Zannad, Saoussen
    AIPR 2020: 2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION, 2020, : 99 - 103
  • [50] Efficient Index Learning via Model Reuse and Fine-tuning
    Liu, Guanli
    Qi, Jianzhong
    Kulik, Lars
    Soga, Kazuya
    Borovica-Gajic, Renata
    Rubinstein, Benjamin I. P.
    2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW, 2023, : 60 - 66