An ontology enhanced parallel SVM for scalable spam filter training

被引:36
作者
Caruana, Godwin [1 ]
Li, Maozhen [1 ,2 ]
Liu, Yang [1 ]
机构
[1] Brunel Univ, Sch Engn & Design, Uxbridge UB8 3PH, Middx, England
[2] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai, Peoples R China
关键词
Spam filtering; Support vector machine; Parallel computing; Classification; MapReduce; CLASSIFICATION ALGORITHMS; SUPPORT;
D O I
10.1016/j.neucom.2012.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 67 条
[1]  
Aarnio T., 2009, SEM INT
[2]  
Anand A., APACHE HADOOP WINS T
[3]  
[Anonymous], 2006, NIPS
[4]  
[Anonymous], RSA C HT1 301 YOK
[5]  
[Anonymous], 2008, P 25 INT C MACHINE L, DOI DOI 10.1145/1390156.1390170
[6]  
[Anonymous], AUTOM REMOTE CONTROL
[7]  
[Anonymous], ASIAN LANGUAGE INFOR, DOI DOI 10.1145/1039621.1039625
[8]  
[Anonymous], 1999, Incremental learning with support vector machines
[9]  
[Anonymous], 2012, ACM COMPUT SURV, DOI DOI 10.1145/2089125.2089129
[10]  
[Anonymous], SC B I RES