Text Classification Algorithms: A Survey

被引:739
作者
Kowsari, Kamran [1 ,2 ]
Meimandi, Kiana Jafari [1 ]
Heidarysafa, Mojtaba [1 ]
Mendu, Sanjana [1 ]
Barnes, Laura [1 ,2 ,3 ]
Brown, Donald [1 ,3 ]
机构
[1] Univ Virginia, Dept Syst & Informat Engn, Charlottesville, VA 22904 USA
[2] Univ Virginia, Sensing Syst Hlth Lab, Charlottesville, VA 22911 USA
[3] Univ Virginia, Sch Data Sci, Charlottesville, VA 22904 USA
关键词
text classification; text mining; text representation; text categorization; text analysis; document classification; ROC CURVE; DIMENSIONALITY REDUCTION; LOGISTIC-REGRESSION; COMPONENT ANALYSIS; NEURAL-NETWORK; BAYES THEOREM; NAIVE BAYES; AREA; MODELS; TREE;
D O I
10.3390/info10040150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.
引用
收藏
页数:68
相关论文
共 50 条
  • [41] A survey of multiple types of text summarization with their satellite contents based on swarm intelligence optimization algorithms
    Mosa, Mohamed Atef
    Anwar, Arshad Syed
    Hamouda, Alaa
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 518 - 532
  • [42] Multidimensional Text Warehousing for Automated Text Classification
    Kim, Jiyun
    Kim, Han-joon
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (02) : 168 - 183
  • [43] A Comparative Text Classification Study with Deep Learning-Based Algorithms
    Koksal, Omer
    Akgul, Ozlem
    2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 387 - 391
  • [44] A DATA-DRIVEN TEXT SIMILARITY MEASURE BASED ON CLASSIFICATION ALGORITHMS
    Cho, Su Gon
    Kim, Seoung Bum
    INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, 2017, 24 (03): : 328 - 339
  • [45] Clinical Text Classification with Word Representation Features and Machine Learning Algorithms
    Almazaydeh, Laiali
    Abuhelaleh, Mohammed
    Al Tawil, Arar
    Elleithy, Khaled
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (04) : 65 - 76
  • [46] Human-annotated rationales and explainable text classification: a survey
    Herrewijnen, Elize
    Nguyen, Dong
    Bex, Floris
    van Deemter, Kees
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [47] Survey on supervised machine learning techniques for automatic text classification
    Ammar Ismael Kadhim
    Artificial Intelligence Review, 2019, 52 : 273 - 292
  • [48] Survey on supervised machine learning techniques for automatic text classification
    Kadhim, Ammar Ismael
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) : 273 - 292
  • [49] Text Classification Using Machine Learning Methods-A Survey
    Agarwal, Basant
    Mittal, Namita
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 : 701 - 709
  • [50] Improving automated Turkish text classification with learning-based algorithms
    Koksal, Omer
    Yilmaz, Eyup Halit
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (11)