Development of a Text Classification Framework using Transformer-based Embeddings

被引:0
|
作者
Yeasmin, Sumona [1 ]
Afrin, Nazia [1 ]
Saif, Kashfia [1 ]
Huq, Mohammad Rezwanul [1 ]
机构
[1] East West Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Natural Language Processing; Machine Learning; Classification; Transformer-based Embedding; Contextual Similarity;
D O I
10.5220/0011268000003269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional text document classification methods represent documents with non-contextualized word embeddings and vector space models. Recent techniques for text classification often rely on word embeddings as a transfer learning component. The existing text document classification methodologies have been explored first and then we evaluated their strengths and limitations. We have started with models based on Bag-ofWords and shifted towards transformer-based architectures. It is concluded that transformer-based embedding is necessary to capture the contextual meaning. BERT, one of the transformer-based embedding architectures, produces robust word embeddings, analyzing from left to right and right to left and capturing the proper context. This research introduces a novel text classification framework based on BERT embeddings of text documents. Several classification algorithms have been applied to the word embeddings of the pre-trained state-of-art BERT model. Experiments show that the random forest classifier obtains the highest accuracy than the decision tree and k-nearest neighbor (KNN) algorithms. Furthermore, the obtained results have been compared with existing work and show up to 50% improvement in accuracy. In the future, this work can be extended by building a hybrid recommender system. combining content-based documents with similar features and user-centric interests. This study shows promising results and validates the proposed methodology viable for text classification.
引用
收藏
页码:74 / 82
页数:9
相关论文
共 50 条
  • [21] Text classification using embeddings: a survey
    da Costa, Liliane Soares
    Oliveira, Italo L.
    Fileto, Renato
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (07) : 2761 - 2803
  • [22] Operational prediction of solar flares using a transformer-based framework
    Abduallah, Yasser
    Wang, Jason T. L.
    Wang, Haimin
    Xu, Yan
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [23] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [24] APTrans: Transformer-Based Multilayer Semantic and Locational Feature Integration for Efficient Text Classification
    Ji, Gaoyang
    Chen, Zengzhao
    Liu, Hai
    Liu, Tingting
    Wang, Bing
    APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [25] LMMS reloaded: Transformer-based sense embeddings for disambiguation and beyond
    Loureiro, Daniel
    Camacho-Collados, Jose
    Jorge, Alipio Mario
    ARTIFICIAL INTELLIGENCE, 2022, 305
  • [26] Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language
    Agbesi, Victor Kwaku
    Chen, Wenyu
    Yussif, Sophyani Banaamwini
    Hossin, Md Altab
    Ukwuoma, Chiagoziem C.
    Kuadey, Noble A.
    Agbesi, Colin Collinson
    Samee, Nagwan Abdel
    Jamjoom, Mona M.
    Al-antari, Mugahed A.
    SYSTEMS, 2024, 12 (01):
  • [27] A transformer-based generative adversarial learning to detect sarcasm from Bengali text with correct classification of confusing text
    Lora, Sanzana Karim
    Jahan, Ishrat
    Hussain, Rahad
    Shahriyar, Rifat
    Islam, A. B. M. Alim Al
    HELIYON, 2023, 9 (12)
  • [28] Arabic abstractive text summarization using RNN-based and transformer-based architectures
    Bani-Almarjeh, Mohammad
    Kurdy, Mohamad-Bassam
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [29] Emotion Classification in a Resource Constrained Language Using Transformer-based Approach
    Das, Avishek
    Sharif, Omar
    Hoque, Mohammed Moshiul
    Sarker, Iqbal H.
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 150 - 158
  • [30] Improving scene text image captioning using transformer-based multilevel attention
    Srivastava, Swati
    Sharma, Himanshu
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)