Spam Detection Using Clustering-Based SVM

被引:0
作者
Pandya, Darshit [1 ]
机构
[1] Indus Univ, Dept Comp Engn, Ahmadabad 382115, Gujarat, India
来源
PROCEEDINGS OF THE 2019 2ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND MACHINE INTELLIGENCE (MLMI 2019) | 2019年
关键词
Text Classification; SVM; Clustering;
D O I
10.1145/3366750.3366754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spam detection task is of much more importance than earlier due to the increase in the use of messaging and mailing services. Efficient classification in such a variety of messages is a comparatively onerous task. There are a variety of machine learning algorithms used for spam detection, one of which is Support Vector Machine, also known as SVM. SVM is widely used to classify text-based documents. Though SVM is a widely used technique in document classification, its performance in the spam classification is not the best due to the uneven density of the training data. In order to improve the efficiency of SVM, I introduce a clustering-based SVM method. The training data is pre-processed using clustering algorithms and then the SVM classifier is implemented on the processed dataset. This method would increase the performance by overcoming the problem of uneven distribution of training data. The experimental results show that the performance is improved compared to that of SVM.
引用
收藏
页码:12 / 15
页数:4
相关论文
共 14 条
[1]  
Aggarwal Anubhav., 2018, INT J ENG TECHNOLOGY, V7, P11, DOI [10.14419/ijet.v7i3.8.15210, https://doi.org/10.14419/ijet.v7i3.8.15210, DOI 10.14419/IJET.V7I3.8.15210]
[2]  
Ananthi S., 2009, Journal of Computer Applications, V2, P20
[3]  
Balasingh Dora Arul Selvi, 2009, SERBIAN J ELECT ENG, V6
[4]  
Chakraborty S., 2012, International Journal of Computer Applications, V47, P26, DOI DOI 10.5120/7274-0435
[5]  
Eberhardt Jeremy, 2014, UMM CSCI SEN SEM C D
[6]  
Kumar Manish, 2016, INT J INNOVATIVE RES, V4, P3200, DOI DOI 10.15680/IJIRCCE.2016.0403048
[7]  
Lee J., 2014, 2014 JOINT 7 INT C S, DOI [10.1109/scis-isis.2014.7044861, DOI 10.1109/SCIS-ISIS.2014.7044861]
[8]   An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization [J].
Lee, Lam Hong ;
Wan, Chin Heng ;
Rajkumar, Rajprasad ;
Isa, Dino .
APPLIED INTELLIGENCE, 2012, 37 (01) :80-99
[9]  
Sasaki M., 2005, 2005 International Conference on Cyberworlds (CW'05), P4, DOI [DOI 10.1109/CW.2005.83, 10.1109/CW.2005.83]
[10]  
Sculley D., 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P415, DOI 10.1145/1277741.1277813