Fine-Tuning of Distil-BERT for Continual Learning in Text Classification: An Experimental Analysis

被引:0
|
作者
Shah, Sahar [1 ]
Manzoni, Sara Lucia [1 ]
Zaman, Farooq [2 ]
Es Sabery, Fatima [3 ]
Epifania, Francesco [4 ]
Zoppis, Italo Francesco [1 ]
机构
[1] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy
[2] Informat Technol Univ, Dept Comp Sci, Lahore, Pakistan
[3] Hassan II Univ, Lab Econ & Logist Performance, Fac Law Econ & Social Sci Mohammedia, Casablanca, Morocco
[4] Social Things srl, Milan, Italy
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Continual learning; natural language processing; text classification; fine-tuning; Distil-BERT;
D O I
10.1109/ACCESS.2024.3435537
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Continual learning (CL) with bidirectional encoder representation from transformer (BERT) and its variant Distil-BERT, have shown remarkable performance in various natural language processing (NLP) tasks, such as text classification (TC). However, the model degrading factors like catastrophic forgetting (CF), accuracy, task dependent architecture ruined its popularity for complex and intelligent tasks. This research article proposes an innovative approach to address the challenges of CL in TC tasks. The objectives are to enable the model to learn continuously without forgetting previously acquired knowledge and perfectly avoid CF. To achieve this, a task-independent model architecture is introduced, allowing training of multiple tasks on the same model, thereby improving overall performance in CL scenarios. The framework incorporates two auxiliary tasks, namely next sentence prediction and task identifier prediction, to capture both the task-generic and task-specific contextual information. The Distil-BERT model, enhanced with two linear layers, categorizes the output representation into a task-generic space and a task-specific space. The proposed methodology is evaluated on diverse sets of TC tasks, including Yahoo, Yelp, Amazon, DB-Pedia, and AG-News. The experimental results demonstrate impressive performance across multiple tasks in terms of F1 score, model accuracy, model evaluation loss, learning rate, and training loss of the model. For the Yahoo task, the proposed model achieved an F1 score of 96.84 %, accuracy of 95.85 %, evaluation loss of 0.06, learning rate of 0.00003144. In the Yelp task, our model achieved an F1 score of 96.66 %, accuracy of 97.66 %, evaluation loss of 0.06, and similarly minimized training losses by achieving the learning rate of 0.00003189. For the Amazon task, the F1 score was 95.82 %, the observed accuracy is 97.83 %, evaluation loss was 0.06, and training losses were effectively minimized by securing the learning rate of 0.00003144. In the DB-Pedia task, we achieved an F1 score of 96.20 %, accuracy of 95.21 %, evaluation loss of 0.08, with learning rate 0.0001972 and rapidly minimized training losses due to the limited number of epochs and instances. In the AG-News task, our model obtained an F1 score of 94.78 %, accuracy of 92.76 %, evaluation loss of 0.06, and fixed the learning rate to 0.0001511. These results highlight the exceptional performance of our model in various TC tasks, with gradual reduction in training losses over time, indicating effective learning and retention of knowledge.
引用
收藏
页码:104964 / 104982
页数:19
相关论文
共 50 条
  • [1] Compressing BERT for Binary Text Classification via Adaptive Truncation before Fine-Tuning
    Zhang, Xin
    Fan, Jing
    Hei, Mengzhe
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [2] EEBERT: An Emoji-Enhanced BERT Fine-Tuning on Amazon Product Reviews for Text Sentiment Classification
    Narejo, Komal Rani
    Zan, Hongying
    Dharmani, Kheem Parkash
    Zhou, Lijuan
    Alahmadi, Tahani Jaser
    Assam, Muhammad
    Sehito, Nabila
    Ghadi, Yazeed Yasin
    IEEE ACCESS, 2024, 12 : 131954 - 131967
  • [3] DCFT: Dependency-aware continual learning fine-tuning for sparse LLMs
    Wang, Yanzhe
    Wang, Yizhen
    Yin, Baoqun
    NEUROCOMPUTING, 2025, 636
  • [4] DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning
    Shon, Hyounguk
    Lee, Janghyeon
    Kim, Seung Hwan
    Kim, Junmo
    COMPUTER VISION - ECCV 2022, PT XXXIII, 2022, 13693 : 513 - 529
  • [5] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
    Song, Jinwang
    Zan, Hongying
    Zhang, Kunli
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
  • [6] Exploring Public Attitude Towards Children by Leveraging Emoji to Track Out Sentiment Using Distil-BERT a Fine-Tuned Model
    Saha, Uchchhwas
    Mahmud, Md. Shihab
    Keya, Mumenunnessa
    Lucky, Effat Ara Easmin
    Khushbu, Sharun Akter
    Noori, Sheak Rashed Haider
    Syed, Muntaser Mansur
    THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND CAPSULE NETWORKS (ICIPCN 2022), 2022, 514 : 332 - 346
  • [7] Efficient Fine-Tuning of BERT Models on the Edge
    Vucetic, Danilo
    Tayaranian, Mohammadreza
    Ziaeefard, Maryam
    Clark, James J.
    Meyer, Brett H.
    Gross, Warren J.
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1838 - 1842
  • [8] ConFit: Contrastive Fine-Tuning of Text-to-Text Transformer for Relation Classification
    Duan, Jiaxin
    Lu, Fengyu
    Liu, Junfei
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 16 - 29
  • [9] Research on fine-tuning strategies for text classification in the aquaculture domain by combining deep learning and large language models
    Zhenglin Li
    Sijia Zhang
    Peirong Cao
    Jiaqi Zhang
    Zongshi An
    Aquaculture International, 2025, 33 (4)
  • [10] A Comparative Analysis of Instruction Fine-Tuning Large Language Models for Financial Text Classification
    Fatemi, Sorouralsadat
    Hu, Yuheng
    Mousavi, Maryam
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2025, 16 (01)