Text Classification Algorithms: A Survey

被引:813
作者
Kowsari, Kamran [1 ,2 ]
Meimandi, Kiana Jafari [1 ]
Heidarysafa, Mojtaba [1 ]
Mendu, Sanjana [1 ]
Barnes, Laura [1 ,2 ,3 ]
Brown, Donald [1 ,3 ]
机构
[1] Univ Virginia, Dept Syst & Informat Engn, Charlottesville, VA 22904 USA
[2] Univ Virginia, Sensing Syst Hlth Lab, Charlottesville, VA 22911 USA
[3] Univ Virginia, Sch Data Sci, Charlottesville, VA 22904 USA
关键词
text classification; text mining; text representation; text categorization; text analysis; document classification; ROC CURVE; DIMENSIONALITY REDUCTION; LOGISTIC-REGRESSION; COMPONENT ANALYSIS; NEURAL-NETWORK; BAYES THEOREM; NAIVE BAYES; AREA; MODELS; TREE;
D O I
10.3390/info10040150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.
引用
收藏
页数:68
相关论文
共 253 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]  
Aggarwal C C., 2016, Recommender Systems, P139
[3]  
Aggarwal C. C., 2018, MACHINE LEARNING TEX
[4]  
Aggarwal C. C., 2012, MINING TEXT DATA, DOI 10.1007/978-1-4614-3223-4_6
[5]  
Aggarwal CharuC., 2012, MINING TEXT DATA, DOI 10.1007/978-1-4614-3223-4_6
[6]   Towards a Supervised Rocchio-based Semantic Classification of Web Pages [J].
Albitar, Shereen ;
Espinasse, Bernard ;
Fournier, Sebastien .
ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 :460-469
[7]  
[Anonymous], 2008, INTRO INFORM RETRIEV
[8]  
[Anonymous], DATA MINING APPROACH
[9]  
[Anonymous], 2018, LANDSLIDE DYNAMICS I
[10]  
[Anonymous], 2010, P 2010 C EMPIRICAL M