Development of a Text Classification Framework using Transformer-based Embeddings

被引:0
|
作者
Yeasmin, Sumona [1 ]
Afrin, Nazia [1 ]
Saif, Kashfia [1 ]
Huq, Mohammad Rezwanul [1 ]
机构
[1] East West Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Natural Language Processing; Machine Learning; Classification; Transformer-based Embedding; Contextual Similarity;
D O I
10.5220/0011268000003269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional text document classification methods represent documents with non-contextualized word embeddings and vector space models. Recent techniques for text classification often rely on word embeddings as a transfer learning component. The existing text document classification methodologies have been explored first and then we evaluated their strengths and limitations. We have started with models based on Bag-ofWords and shifted towards transformer-based architectures. It is concluded that transformer-based embedding is necessary to capture the contextual meaning. BERT, one of the transformer-based embedding architectures, produces robust word embeddings, analyzing from left to right and right to left and capturing the proper context. This research introduces a novel text classification framework based on BERT embeddings of text documents. Several classification algorithms have been applied to the word embeddings of the pre-trained state-of-art BERT model. Experiments show that the random forest classifier obtains the highest accuracy than the decision tree and k-nearest neighbor (KNN) algorithms. Furthermore, the obtained results have been compared with existing work and show up to 50% improvement in accuracy. In the future, this work can be extended by building a hybrid recommender system. combining content-based documents with similar features and user-centric interests. This study shows promising results and validates the proposed methodology viable for text classification.
引用
收藏
页码:74 / 82
页数:9
相关论文
共 50 条
  • [1] Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings
    Seegmiller, Parker
    Preum, Sarah Masud
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9600 - 9611
  • [2] Practical Transformer-based Multilingual Text Classification
    Wang, Cindy
    Banko, Michele
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 121 - 129
  • [3] A Transformer-Based Framework for Scene Text Recognition
    Selvam, Prabu
    Koilraj, Joseph Abraham Sundar
    Tavera Romero, Carlos Andres
    Alharbi, Meshal
    Mehbodniya, Abolfazl
    Webber, Julian L.
    Sengan, Sudhakar
    IEEE ACCESS, 2022, 10 : 100895 - 100910
  • [4] An Effective, Efficient, and Scalable Confidence-Based Instance Selection Framework for Transformer-Based Text Classification
    Cunha, Washington
    Franca, Celso
    Fonseca, Guilherme
    Rocha, Leonardo
    Goncalves, Marcos Andre
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 665 - 674
  • [5] A Transformer-Based Framework for Payload Malware Detection and Classification
    Stein, Kyle
    Mahyari, Arash
    Francia, Guillermo, III
    El-Sheikh, Eman
    2024 IEEE 5TH ANNUAL WORLD AI IOT CONGRESS, AIIOT 2024, 2024, : 0105 - 0111
  • [6] Transformer-Based Composite Language Models for Text Evaluation and Classification
    Skoric, Mihailo
    Utvic, Milos
    Stankovic, Ranka
    MATHEMATICS, 2023, 11 (22)
  • [7] Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
    de Lima Santos, Diego Bernardes
    de Carvalho Dutra, Frederico Giffoni
    Parreiras, Fernando Silva
    Brandao, Wladmir Cardoso
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 473 - 483
  • [8] A Temporal Transformer-Based Fusion Framework for Morphological Arrhythmia Classification
    Anjum, Nafisa
    Sathi, Khaleda Akhter
    Hossain, Md. Azad
    Dewan, M. Ali Akber
    COMPUTERS, 2023, 12 (03)
  • [9] Transformer-based sensor failure prediction and classification framework for UAVs
    Ahmad, Muhammad Waqas
    Akram, Muhammad Usman
    Mohsan, Mashood Mohammad
    Saghar, Kashif
    Ahmad, Rashid
    Butt, Wasi Haider
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [10] Transformer-based Automatic Music Mood Classification Using Multi-modal Framework
    Kumar, Sujeesha Ajithakumari Suresh
    Rajan, Rajeev
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2023, 23 (01): : 18 - 34