E-mail Spam Filtering By a New Hybrid Feature Selection Method Using Chi2 as Filter and Random Tree as Wrapper

被引:4
作者
Pourhashemi, Seyed Mostafa [1 ]
机构
[1] Islamic Azad Univ, Dept Comp, Dezful Branch, Dezful, Iran
来源
ENGINEERING JOURNAL-THAILAND | 2014年 / 18卷 / 03期
关键词
Feature Extraction; feature selection; classification; spam filtering; machine learning;
D O I
10.4186/ej.2014.18.3.123
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The purpose of this research is presenting a machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Chi Squared (Chi2) filter and Random Tree wrapper as feature selectors. In addition, Multinomial Naive Bayes (MNB) classifier, Discriminative Multinomial Naive Bayes (DMNB) classifier, Support Vector Machine (SVM) classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%.
引用
收藏
页码:123 / 134
页数:12
相关论文
共 22 条
[1]  
Alpaydin E., 2010, INTRO MACHINE LEARNI, DOI [10.1007/978-1-62703-748-8_7, DOI 10.1007/978-1-62703-748-8_7]
[2]  
Androutsopoulos I., 2000, SIGIR Forum, V34, P160
[3]  
Androutsopoulos I, 2000, P WORKSH MACH LEARN, P9
[4]  
Basavaraju M., 2010, INT J COMPUT APPL, V5, P15, DOI [10.5120/906-1283, DOI 10.5120/906-1283]
[5]  
Beiranvand A., 2012, J ACAD APPL STUDIES, V2, P25
[6]  
Biau G, 2008, J MACH LEARN RES, V9, P2015
[7]   A survey of learning-based techniques of email spam filtering [J].
Blanzieri, Enrico ;
Bryl, Anton .
ARTIFICIAL INTELLIGENCE REVIEW, 2008, 29 (01) :63-92
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Using phrases as features in email classification [J].
Chang, Matthew ;
Poon, Chung Keung .
JOURNAL OF SYSTEMS AND SOFTWARE, 2009, 82 (06) :1036-1045
[10]   A review of machine learning approaches to Spam filtering [J].
Guzella, Thiago S. ;
Caminhas, Walmir M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) :10206-10222