An evaluation of naive Bayesian anti-spam filtering techniques

被引:17
作者
Deshpande, Vikas P. [1 ]
Erbacher, Robert F. [2 ]
Harris, Chris [2 ]
机构
[1] Utah State Univ, Comp Sci, Logan, UT 84322 USA
[2] Utah State Univ, Dept Comp Sci, Logan, UT 84322 USA
来源
2007 IEEE INFORMATION ASSURANCE WORKSHOP | 2007年
关键词
spam filter; naive Bayesian; evaluation;
D O I
10.1109/IAW.2007.381951
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An efficient anti-spam filter that would block all spam, without blocking any legitimate messages is a growing need. To address this problem, we examine the effectiveness of statistically-based approaches Nave Bayesian anti-spam filters, as it is content-based and self-learning (adaptive) in nature. Additionally, we designed a derivative filter based on relative numbers of tokens. We train the filters using a large corpus of legitimate messages and spam and we test the filter using new incoming personal messages. More specifically, four filtering techniques available for a Naive Bayesian filter are evaluated. We look at the effectiveness of the technique, and we evaluate different threshold values in order to rind an optimal anti-spam filter configuration. Based on cost-sensitive measures, we conclude that additional safety precautions are needed for a Bayesian anti-spam filter to be put into practice. However, our technique can make a positive contribution as a first pass filter.
引用
收藏
页码:333 / +
页数:2
相关论文
共 13 条
  • [1] Androutsopoulos I, 2000, P 23 ANN INT ACM SIG
  • [2] [Anonymous], 2002, A plan for spam
  • [3] COHEN W, 1996, AAI SPRING S MACH LE
  • [4] DOMINIGOS P, 1996, P 13 INT C MACH LEAR
  • [5] Graham P., 2003, BETTER BAYESIAN FILT
  • [6] LANGLEY P, 1992, P 10 NAT C AI
  • [7] POTAMIAS G, 2000, P WORKSH MACH LEARN
  • [8] PROVOST J, AITR99284 U TEX
  • [9] Rennie Jason D. M., KDD 2000 TEXT MIN WO
  • [10] SAHAMI M, 1998, AAI WORKSH LEARN TEX