Improvising the accuracy in Classification of Spam emails through Outlier Detection and Classification techniques

被引:0
作者
Nancy, P. [1 ]
Ramani, R. Geetha [1 ]
Jacob, Shomona Gracia [1 ]
机构
[1] Rajalakshmi Engn Coll, Dept Comp Sci & Engn, Madras, Tamil Nadu, India
来源
2012 INTERNATIONAL CONFERENCE ON FUTURE COMMUNICATION AND COMPUTER TECHNOLOGY (ICFCCT 2012) | 2012年
关键词
Data Mining; Spam; Outliers; Classification Algorithms;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Electronic mail is a common method of exchanging digital messages among people. All who use email, experience the problem of Spam and it becomes essential that an email spam be correctly classified. Data Mining, a powerful new technology with great potential to help companies focus on the most important information in their data warehouses can be utilized to classify Spam. The Spambase dataset obtained from UCI repository is used in this work. Various Classification Algorithms (C4.5, C-RT, ID3, Random tree etc.,) were applied to the dataset in classifying whether an email is Spam or normal. It was identified that the accuracy of the classification algorithms increased after the detection and removal of outliers. Univariate outlier detection with Grubb's test and sigma rule is applied to the dataset for outlier detection. Nearly 117 instances were detected to be Outliers and removed. It is affirmed that some of the Classification algorithms (Multilayer Perceptron, Naives Bayes Cont, PLS-DA, PLS-LDA, Random Tree) provide good results after the removal of Outliers. Random tree classification algorithm gave 99.99% accuracy and the rules obtained are used to predict the email as spam or normal. The precision of the classifier was verified with a test dataset.
引用
收藏
页码:173 / 179
页数:7
相关论文
共 14 条
[1]  
Alguliev Rasim M., 2011, APPL COMPUTATIONAL I
[2]  
Androutsopoulos I., 2000, P EUR C MACH LEARN, P9
[3]  
[Anonymous], P 23 ANN INT ACM SIG
[4]  
Biro I., 2008, P 4 INT WORKSH ADV I
[5]  
Chandrasekaran M., 2006, 9 ANN NYS CYBER SECU, P2
[6]  
Cortez P, 2009, 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, P149
[7]  
Frank A., 2010, UCI machine learning repository, V213
[8]  
Kumar R. Kishore, P INT MULT ENG COMP, VI
[9]  
Lakshmi R. Deepa, 2010, IJCSE INT J COMPUTER, V02, P2760
[10]  
Nazirova S., 2010, P 3 INT C PROBL CYB, V2, P206