A recent overview of the state-of-the-art elements of text classification

被引:192
作者
Mironczuk, Marcin Michal [1 ]
Protasiewicz, Jaroslaw [1 ]
机构
[1] Natl Informat Proc Inst, Al Niepodleglosci 188 B, PL-00608 Warsaw, Poland
关键词
Text classification; Document classification; Text classification overview; Document classification overview; FEATURE-SELECTION METHOD; LINGUAL SENTIMENT CLASSIFICATION; COMBINING MULTIPLE CLASSIFIERS; PROPAGATION NEURAL-NETWORK; TERM WEIGHTING SCHEMES; DOCUMENT CLASSIFICATION; NAIVE BAYES; AUTOMATIC CLASSIFICATION; DIMENSION REDUCTION; INSTANCE SELECTION;
D O I
10.1016/j.eswa.2018.03.058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of this study is to provide an overview the state-of-the-art elements of text classification. For this purpose, we first select and investigate the primary and recent studies and objectives in this field. Next, we examine the state-of-the-art elements of text classification. In the following steps, we qualitatively and quantitatively analyse the related works. Herein, we describe six baseline elements of text classification including data collection, data analysis for labelling, feature construction and weighing, feature selection and projection, training of a classification model, and solution evaluation. This study will help readers acquire the necessary information about these elements and their associated techniques. Thus, we believe that this study will assist other researchers and professionals to propose new studies in the field of text classification. (C) 2018 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:36 / 54
页数:19
相关论文
共 241 条
[1]   Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Salem, Arab .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
[2]   Learning patterns for discovering domain-oriented opinion words [J].
Agathangelou, Pantelis ;
Katakis, Ioannis ;
Koutoulakis, Ioannis ;
Kokkoras, Fotis ;
Gunopulos, Dimitrios .
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 55 (01) :45-77
[3]  
Aggarwal C.C., 2015, Data Mining: The Textbook, DOI [10.1007/978-3-319-14142-8, DOI 10.1007/978-3-319-14142-8]
[4]  
Aggarwal Charu C, 2012, Mining text data, P163, DOI [DOI 10.1007/978-1-4614-3223-46, DOI 10.1007/978-1-4614-3223-4, 10.1007/978-1-4614-3223-4]
[5]   Variable Global Feature Selection Scheme for automatic classification of text documents [J].
Agnihotri, Deepak ;
Verma, Kesari ;
Tripathi, Priyanka .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 :268-281
[6]   RFBoost: An improved multi-label boosting algorithm and its application to text categorisation [J].
Al-Salemi, Bassam ;
Noah, Shahrul Azman Mohd ;
Ab Aziz, Mohd Juzaiddin .
KNOWLEDGE-BASED SYSTEMS, 2016, 103 :104-117
[7]   Soft-constrained Laplacian score for semi-supervised multi-label feature selection [J].
Alalga, Abdelouahid ;
Benabdeslem, Khalid ;
Taleb, Nora .
KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 47 (01) :75-98
[8]   Accurate multi-criteria decision making methodology for recommending machine learning algorithm [J].
Ali, Rahman ;
Lee, Sungyoung ;
Chung, Tae Choong .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 71 :257-278
[9]   Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering [J].
Almeida, Tiago A. ;
Silva, Tiago P. ;
Santos, Igor ;
Gomez Hidalgo, Jose M. .
KNOWLEDGE-BASED SYSTEMS, 2016, 108 :25-32
[10]   Single- vs. multiple-instance classification [J].
Alpaydin, Ethem ;
Cheplygina, Veronika ;
Loog, Marco ;
Tax, David M. J. .
PATTERN RECOGNITION, 2015, 48 (09) :2831-2838