Efficient English text classification using selected Machine Learning Techniques

被引:111
作者
Luo, Xiaoyu [1 ]
机构
[1] Hunan Univ Technol & Business, 569 Yuelu Rd, Changsha 411104, Hunan, Peoples R China
关键词
Text classification; English language; Machine Learning; Text mining; Support Vector Machines; NETWORKS; SVM;
D O I
10.1016/j.aej.2021.02.009
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Text classification (TC) is an approach used for the classification of any kind of documents for the target category or out. In this paper, we implemented the Support Vector Machines (SVM) model in classifying English text and documents. Here we did two analytical experiments to check the selected classifiers using English documents. Experimental results performed on a set of 1033 text document present that the Rocchio classifier provides the best performance results when the size of the feature set is small while SVM outperforms the other classifiers. From the experimental analysis, we observed that the classification rate exceeds 90% when using more than 4000 features. (C) 2021 THE AUTHOR. Published by Elsevier BV on behalf of Faculty of Engineering, Alexandria University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).
引用
收藏
页码:3401 / 3409
页数:9
相关论文
共 40 条
[1]  
Abd El-Jawad MH, 2018, INT COMPUT ENG CONF, P174, DOI 10.1109/ICENCO.2018.8636124
[2]   Sentiment analysis of Arabic tweets using text mining techniques [J].
Al-Horaibi, Lamia ;
Khan, Muhammad Badruddin .
FIRST INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2016, 0011
[3]  
[Anonymous], 2018, STUDIES COMPUTATIONA, DOI DOI 10.1007/978-3-319-67056-0_20
[4]   MedRec: Using Blockchain for Medical Data Access and Permission Management [J].
Azaria, Asaph ;
Ekblaw, Ariel ;
Vieira, Thiago ;
Lippman, Andrew .
PROCEEDINGS 2016 2ND INTERNATIONAL CONFERENCE ON OPEN AND BIG DATA - OBD 2016, 2016, :25-30
[5]   Sentiment classification of Roman-Urdu opinions using Naive Bayesian, Decision Tree and KNN classification techniques [J].
Bilal, Muhammad ;
Israr, Huma ;
Shahid, Muhammad ;
Khan, Amin .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2016, 28 (03) :330-344
[6]  
Brahimi B, 2016, J DIGITAL INFORM MAN, V14
[7]  
Chelba C, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P280
[8]   A novel ensemble approach of bivariate statistical-based logistic model tree classifier for landslide susceptibility assessment [J].
Chen, Wei ;
Shahabi, Himan ;
Shirzadi, Ataollah ;
Li, Tao ;
Guo, Chen ;
Hong, Haoyuan ;
Li, Wei ;
Pan, Di ;
Hui, Jiarui ;
Ma, Mingzhe ;
Xi, Manna ;
Bin Ahmad, Baharin .
GEOCARTO INTERNATIONAL, 2018, 33 (12) :1398-1420
[9]  
Dobson AJ., 2018, An Introduction to Generalized Linear Models
[10]  
Eder M, 2016, R J, V8, P107