A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights

被引:4
作者
Taha, Kamal [1 ]
Yoo, Paul D. [2 ]
Yeun, Chan [3 ]
Homouz, Dirar [4 ]
Taha, Aya [5 ]
机构
[1] Khalifa Univ, Dept Comp Sci, Abu Dhabi, U Arab Emirates
[2] Univ London, Birkbeck Coll, Dept Comp Sci & Informat Syst, London, England
[3] Khalifa Univ, Ctr Cyber Phys Syst, Dept Elect Engn & Comp Sci, Abu Dhabi, U Arab Emirates
[4] Khalifa Univ, Dept Phys, Abu Dhabi, U Arab Emirates
[5] Brighton Coll, Dubai, U Arab Emirates
关键词
Text classification; Text data mining; Data science; Artificial intelligence; Deep learning; NEURAL-NETWORK; MODEL;
D O I
10.1016/j.cosrev.2024.100664
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The exponential growth of textual data presents substantial challenges in management and analysis, notably due to high storage and processing costs. Text classification, a vital aspect of text mining, provides robust solutions by enabling efficient categorization and organization of text data. These techniques allow individuals, researchers, and businesses to derive meaningful patterns and insights from large volumes of text. This survey paper introduces a comprehensive taxonomy specifically designed for text classification based on research fields. The taxonomy is structured into hierarchical levels: research field-based category, research field-based sub-category, methodology-based technique, methodology sub-technique, and research field applications. We employ a dual evaluation approach: empirical and experimental. Empirically, we assess text classification techniques across four critical criteria. Experimentally, we compare and rank the methodology sub-techniques within the same methodology technique and within the same overall research field sub-category. This structured taxonomy, coupled with thorough evaluations, provides a detailed and nuanced understanding of text classification algorithms and their applications, empowering researchers to make informed decisions based on precise, field- specific insights.
引用
收藏
页数:21
相关论文
共 87 条
  • [1] Crowdsourcing the character of a place: Character-level convolutional networks for multilingual geographic text classification
    Adams, Benjamin
    McKenzie, Grant
    [J]. TRANSACTIONS IN GIS, 2018, 22 (02) : 394 - 408
  • [2] A Hybrid Deep Learning Technique for Personality Trait Classification From Text
    Ahmad, Hussain
    Asghar, Muhammad Usama
    Asghar, Muhammad Zubair
    Khan, Aurangzeb
    Mosavi, Amir H.
    [J]. IEEE ACCESS, 2021, 9 : 146214 - 146232
  • [3] Akbas E., 2017, P 2017 IEEEACM INT C, P305, DOI DOI 10.1145/3110025.3110092
  • [4] Survey of Text Mining Techniques Applied to Judicial Decisions Prediction
    Alcantara Francia, Olga Alejandra
    Nunez-del-Prado, Miguel
    Alatrista-Salas, Hugo
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [5] amazon-reviews-2023.github.io, Amazon Reviews dataset
  • [6] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
    Aridas, Christos K.
    Karlos, Stamatis
    Kanas, Vasileios G.
    Fazakis, Nikos
    Kotsiantis, Sotiris B.
    [J]. IEEE ACCESS, 2020, 8 : 2122 - 2133
  • [7] Bin Qin, 2020, 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), P162, DOI 10.1109/ICBAIE49996.2020.00041
  • [8] Biswas S., 2022, INT C AI CYB ICAIC, P1
  • [9] Large-scale robust transductive support vector machines
    Cevikalp, Hakan
    Franc, Vojtech
    [J]. NEUROCOMPUTING, 2017, 235 : 199 - 209
  • [10] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794