Multi-category news classification using Support Vector Machine based classifiers

被引:24
作者
Saigal, Pooja [1 ]
Khanna, Vaibhav [2 ]
机构
[1] Vivekananda Inst Profess Studies, Sch Informat Technol, New Delhi, India
[2] AXA AL, Appl Al & Data Sci, Gurugram, Haryana, India
来源
SN APPLIED SCIENCES | 2020年 / 2卷 / 03期
关键词
SVM; Text categorization; LS-SVM; TWSVM; LS-TWSVM; TF-IDF; Tokenization; Stemming; IMPROVEMENTS;
D O I
10.1007/s42452-020-2266-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Support Vector Machine (SVM) and its variants are gaining momentum among the Machine Learning community. In this paper, we present a quantitative analysis between the established SVM based classifiers on multi-category text classification problem. Here, we are particularly interested in studying the behaviour of Least-squares Support Vector Machines, Twin Support Vector Machines and Least-squares Twin Support Vector Machines (LS-TWSVM) classifiers on News data. Since, all these are binary classifiers, they are extended using One-Against-All approach to handle multi-category data. The dataset is first converted into required format by performing preprocessing activities which involve tokenization and removing irrelevant data. The feature set is constructed as Term Frequency-Inverse Document Frequency matrix, so that representative vectors could be obtained for each document. Experimentally, we have compared the performance of each classification algorithm by performing simulations on benchmark UCI News datasets: Reuters and 20 Newsgroups. This paper shows that LS-TWSVM proves to be the best of all three, both in terms of accuracy and time complexity (training and testing).
引用
收藏
页数:12
相关论文
共 27 条
[1]   Improvements on twin-hypersphere support vector machine using local density information [J].
Ai Q. ;
Wang A. ;
Wang Y. ;
Sun H. .
Progress in Artificial Intelligence, 2018, 7 (03) :167-175
[2]  
[Anonymous], THESIS
[3]  
[Anonymous], 2000, Pattern Classification, DOI DOI 10.1007/978-3-319-57027-3_4
[4]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[5]   On machine learning methods for Chinese document categorization [J].
He, J ;
Tan, AH ;
Tan, CL .
APPLIED INTELLIGENCE, 2003, 18 (03) :311-322
[6]   A comparison of methods for multiclass support vector machines [J].
Hsu, CW ;
Lin, CJ .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (02) :415-425
[7]   Twin support vector machines for pattern classification [J].
Jayadeva ;
Khemchandani, R. ;
Chandra, Suresh .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (05) :905-910
[8]  
Joachims T, 1998, EUR C MACH LEARN, V21, P137
[9]   Improvements on ν-Twin Support Vector Machine [J].
Khemchandani, Reshma ;
Saigal, Pooja ;
Chandra, Suresh .
NEURAL NETWORKS, 2016, 79 :97-107
[10]   Least squares twin support vector machines for pattern classification [J].
Kumar, M. Arun ;
Gopal, M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7535-7543