Text Classification Algorithms: A Survey

被引:739
|
作者
Kowsari, Kamran [1 ,2 ]
Meimandi, Kiana Jafari [1 ]
Heidarysafa, Mojtaba [1 ]
Mendu, Sanjana [1 ]
Barnes, Laura [1 ,2 ,3 ]
Brown, Donald [1 ,3 ]
机构
[1] Univ Virginia, Dept Syst & Informat Engn, Charlottesville, VA 22904 USA
[2] Univ Virginia, Sensing Syst Hlth Lab, Charlottesville, VA 22911 USA
[3] Univ Virginia, Sch Data Sci, Charlottesville, VA 22904 USA
关键词
text classification; text mining; text representation; text categorization; text analysis; document classification; ROC CURVE; DIMENSIONALITY REDUCTION; LOGISTIC-REGRESSION; COMPONENT ANALYSIS; NEURAL-NETWORK; BAYES THEOREM; NAIVE BAYES; AREA; MODELS; TREE;
D O I
10.3390/info10040150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.
引用
收藏
页数:68
相关论文
共 50 条
  • [31] A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights
    Taha, Kamal
    Yoo, Paul D.
    Yeun, Chan
    Homouz, Dirar
    Taha, Aya
    COMPUTER SCIENCE REVIEW, 2024, 54
  • [32] Semantic text classification: A survey of past and recent advances
    Altinel, Berna
    Ganiz, Murat Can
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1129 - 1153
  • [33] A Survey on Text Classification Techniques for Sentiment Polarity Detection
    Arunachalam, N.
    Sneka, Josephine S.
    MadhuMathi, G.
    2017 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2017,
  • [34] Performance Analysis of Supervised Machine Learning Algorithms for Text Classification
    Mishu, Sadia Zaman
    Rafiuddin, S. M.
    PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 409 - 413
  • [35] Application of improved distributed naive Bayesian algorithms in text classification
    Hongyi Gao
    Xi Zeng
    Chunhua Yao
    The Journal of Supercomputing, 2019, 75 : 5831 - 5847
  • [36] A Survey on Text Classification: From Traditional to Deep Learning
    Li, Qian
    Peng, Hao
    Li, Jianxin
    Xia, Congying
    Yang, Renyu
    Sun, Lichao
    Yu, Philip S.
    He, Lifang
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (02)
  • [37] Analysis of Supervised Text Classification Algorithms on Corporate Sustainability Reports
    Shahi, Amir Mohammad
    Issac, Biju
    Modapothala, Jashua Rajesh
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 96 - 100
  • [38] Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study
    Raho, Ghazi
    Al-Shalabi, Riyad
    Kanaan, Ghassan
    Asma'aNassar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (02) : 192 - 195
  • [39] An Improved Similarity Measure for Text Clustering and Classification
    Reddy, G. Suresh
    Kanth, T. V. Rajini
    Rao, A. Ananda
    ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
  • [40] Enhanced sparse representation classifier for text classification
    Unnikrishnan, P.
    Govindan, V. K.
    Kumar, S. D. Madhu
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 129 : 260 - 272