DeepCKID: A Multi-Head Attention-Based Deep Neural Network Model Leveraging Classwise Knowledge to Handle Imbalanced Textual Data

被引:2
作者
Sah, Amit Kumar [1 ]
Abulaish, Muhammad [1 ]
机构
[1] South Asian Univ, Dept Comp Sci, New Delhi, India
来源
MACHINE LEARNING WITH APPLICATIONS | 2024年 / 17卷
关键词
Class imbalance; Text classification; Transformers; Deep learning; Multi-Head Attention; Pre-trained Language Models;
D O I
10.1016/j.mlwa.2024.100575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents DeepCKID, a Multi-Head Attention (MHA)-based deep learning model that exploits statistical and semantic knowledge corresponding to documents across different classes in the datasets to improve the model's ability to detect minority class instances in imbalanced text classification. In this process, corresponding to each document, DeepCKID extracts - (i) word-level statistical and semantic knowledge, namely, class correlation and class similarity corresponding to each word, based on its association with different classes in the dataset, and (ii) class-level knowledge from the document using n-grams and relation triplets corresponding to classwise keywords present, identified using cosine similarity utilizing Transformers-based Pre-trained Language Models (PLMs). DeepCKID encodes the word-level and class-level features using deep convolutional networks, which can learn meaningful patterns from them. At first, DeepCKID combines the semantically meaningful Sentence-BERT document embeddings and word-level feature matrix to give the final document representation, which it further fuses to the different classwise encoded representations to strengthen feature propagation. DeepCKID then passes the encoded document representation and its different classwise representations through an MHA layer to identify the important features at different positions of the feature subspaces, resulting in a latent dense vector accentuating its association with a particular class. Finally, DeepCKID passes the latent vector to the softmax layer to learn the corresponding class label. We evaluate DeepCKID over six publicly available Amazon reviews datasets using four Transformers-based PLMs. We compare DeepCKID with three approaches and four ablation-like baselines. Our study suggests that in most cases, DeepCKID outperforms all the comparison approaches, including baselines.
引用
收藏
页数:18
相关论文
共 52 条
[1]  
Abulaish Muhammad, 2019, 2019 11th International Conference on Communication Systems & Networks (COMSNETS), P625, DOI 10.1109/COMSNETS.2019.8711054
[2]  
Angeli G, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P344
[3]   A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain [J].
Arslan, Yusuf ;
Allix, Kevin ;
Veiber, Lisa ;
Lothritz, Cedric ;
Bissyande, Tegawende F. ;
Klein, Jacques ;
Goujon, Anne .
WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, :260-268
[4]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[5]  
Bertsch A, 2023, Arxiv, DOI arXiv:2305.01625
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]  
Conneau A, 2017, 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, P1107
[8]   Class-Balanced Loss Based on Effective Number of Samples [J].
Cui, Yin ;
Jia, Menglin ;
Lin, Tsung-Yi ;
Song, Yang ;
Belongie, Serge .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9260-9269
[9]   DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data [J].
Dablain, Damien ;
Krawczyk, Bartosz ;
Chawla, Nitesh, V .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) :6390-6404
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171