Text Classification Algorithms: A Survey

被引:739
|
作者
Kowsari, Kamran [1 ,2 ]
Meimandi, Kiana Jafari [1 ]
Heidarysafa, Mojtaba [1 ]
Mendu, Sanjana [1 ]
Barnes, Laura [1 ,2 ,3 ]
Brown, Donald [1 ,3 ]
机构
[1] Univ Virginia, Dept Syst & Informat Engn, Charlottesville, VA 22904 USA
[2] Univ Virginia, Sensing Syst Hlth Lab, Charlottesville, VA 22911 USA
[3] Univ Virginia, Sch Data Sci, Charlottesville, VA 22904 USA
关键词
text classification; text mining; text representation; text categorization; text analysis; document classification; ROC CURVE; DIMENSIONALITY REDUCTION; LOGISTIC-REGRESSION; COMPONENT ANALYSIS; NEURAL-NETWORK; BAYES THEOREM; NAIVE BAYES; AREA; MODELS; TREE;
D O I
10.3390/info10040150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.
引用
收藏
页数:68
相关论文
共 50 条
  • [21] A survey on dimension reduction techniques in text classification
    Wang, Zhi Juan
    Zhou, Ruo Song
    COMPUTING, CONTROL, INFORMATION AND EDUCATION ENGINEERING, 2015, : 633 - 635
  • [22] Graph neural networks for text classification: a survey
    Wang, Kunze
    Ding, Yihao
    Han, Soyeon Caren
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (08)
  • [23] Machine learning algorithms in Arabic Text Classification: A Review
    Aboalnaser, Sara A.
    12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 290 - 295
  • [24] PERFORMANCE ANALYSIS OF HEURISTIC SEARCH ALGORITHMS IN TEXT CLASSIFICATION
    Haltas, Ahmet
    Alkan, Ahmet
    Karabulut, Mustafa
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2015, 30 (03): : 417 - 427
  • [25] Text mining: A survey of Arabic root extraction algorithms
    Hamza, Manar Ahmed Mohammed
    Ahmed, Tarig Mohamed
    Hilal, Anwer Mustafa Mohamedsalih
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2021, 8 (01): : 11 - 19
  • [26] Combination of rough sets and genetic algorithms for text classification
    Bai, Rujiang
    Wang, Xiaoyue
    Liao, Junhua
    AUTONOMOUS INTELLIGENT SYSTEMS: AGENTS AND DATA MINING, PROCEEDINGS, 2007, 4476 : 256 - +
  • [27] Text Classification for Organizational Researchers: A Tutorial
    Kobayashi, Vladimer B.
    Mol, Stefan T.
    Berkers, Hannah A.
    Kismihok, Gabor
    Den Hartog, Deanne N.
    ORGANIZATIONAL RESEARCH METHODS, 2018, 21 (03) : 766 - 799
  • [28] Learning label smoothing for text classification
    Ren H.
    Zhao Y.
    Zhang Y.
    Sun W.
    PeerJ Computer Science, 2024, 10
  • [29] Text based classification of companies in CrunchBase
    Batista, Fernando
    Carvalho, Joao Paulo
    2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [30] Automated Classification of Variants of Norwegian by Means of Text Mining of Unannotated Text
    Overland, Fartein Th
    STUDIA UNIVERSITATIS BABES-BOLYAI PHILOLOGIA, 2020, 65 (03): : 107 - 124