Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

被引:18
作者
Palanivinayagam, Ashokkumar [1 ]
El-Bayeh, Claude Ziad [2 ]
Damasevicius, Robertas [3 ]
机构
[1] Sri Ramachandra Inst Higher Educ & Res, Sri Ramachandra Fac Engn & Technol, Chennai 600116, India
[2] Bayeh Inst, Dept Elect Engn, Amchit 4307, Lebanon
[3] Kaunas Univ Technol, Dept Software Engn, LT-44249 Kaunas, Lithuania
关键词
machine learning; text classification; natural language processing; spam detection; sentiment analysis; rating summarization; SUPPORT VECTOR MACHINE; SENTIMENT CLASSIFICATION; AUTOMATIC CLASSIFICATION; FEATURE-SELECTION; DEEP; FEATURES; MODEL; CLASSIFIERS; EXTRACTION; FREQUENCY;
D O I
10.3390/a16050236
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classification. The Preferred Reporting Items for Systematic Reviews (PRISMA) statement is used as the guidelines for the systematic review process. The comprehensive differences in the literature are analyzed in terms of six aspects: datasets, machine learning models, best accuracy, performance evaluation metrics, training and testing splitting methods, and comparisons among machine learning models. Furthermore, we highlight the limitations and research gaps in the literature. Although the research works included in the survey perform well in terms of text classification, improvement is required in many areas. We believe that this survey paper will be useful for researchers in the field of text classification.
引用
收藏
页数:28
相关论文
共 137 条
  • [1] On the Use of Side Information for Mining Text Data
    Aggarwal, Charu C.
    Zhao, Yuchen
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (06) : 1415 - 1429
  • [2] Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network
    Akhter, Muhammad Pervez
    Jiangbin, Zheng
    Naqvi, Irfan Raza
    Abdelmajeed, Mohammed
    Mehmood, Atif
    Sadiq, Muhammad Tariq
    [J]. IEEE ACCESS, 2020, 8 (08): : 42689 - 42707
  • [3] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model
    Aldjanabi, Wassen
    Dahou, Abdelghani
    Al-qaness, Mohammed A. A.
    Abd Elaziz, Mohamed
    Helmi, Ahmed Mohamed
    Damasevicius, Robertas
    [J]. INFORMATICS-BASEL, 2021, 8 (04):
  • [4] A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language
    Alhaj, Yousif A.
    Dahou, Abdelghani
    Al-qaness, Mohammed A. A.
    Abualigah, Laith
    Abbasi, Aaqif Afzaal
    Almaweri, Nasser Ahmed Obad
    Abd Elaziz, Mohamed
    Damasevicius, Robertas
    [J]. FUTURE INTERNET, 2022, 14 (07):
  • [5] Clustering and classification of email contents
    Alsmadi, Izzat
    Alhami, Ikdam
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (01) : 46 - 57
  • [6] A corpus-based semantic kernel for text classification by using meaning values of terms
    Altinel, Berna
    Ganiz, Murat Can
    Diri, Banu
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 43 : 54 - 66
  • [7] Intelligent optimal route recommendation among heterogeneous objects with keywords
    Ashokkumar, P.
    Arunkumar, N.
    Don, S.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 68 : 526 - 535
  • [8] Significance of machine learning algorithms in professional blogger's classification
    Asim, Yousra
    Shahid, Ahmad Raza
    Malik, Ahmad Kamran
    Raza, Basit
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 65 : 461 - 473
  • [9] Athiwaratkun B., 2018, P ACL
  • [10] Babapour SM, 2017, 2017 IEEE 4TH INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED ENGINEERING AND INNOVATION (KBEI), P320, DOI 10.1109/KBEI.2017.8324994