An evaluation of naive Bayesian anti-spam filtering techniques

被引：17

作者：

Deshpande, Vikas P. ^{[1
]}

Erbacher, Robert F. ^{[2
]}

Harris, Chris ^{[2
]}

机构：

[1] Utah State Univ, Comp Sci, Logan, UT 84322 USA

[2] Utah State Univ, Dept Comp Sci, Logan, UT 84322 USA

来源：

2007 IEEE INFORMATION ASSURANCE WORKSHOP | 2007年

关键词：

spam filter; naive Bayesian; evaluation;

D O I：

10.1109/IAW.2007.381951

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An efficient anti-spam filter that would block all spam, without blocking any legitimate messages is a growing need. To address this problem, we examine the effectiveness of statistically-based approaches Nave Bayesian anti-spam filters, as it is content-based and self-learning (adaptive) in nature. Additionally, we designed a derivative filter based on relative numbers of tokens. We train the filters using a large corpus of legitimate messages and spam and we test the filter using new incoming personal messages. More specifically, four filtering techniques available for a Naive Bayesian filter are evaluated. We look at the effectiveness of the technique, and we evaluate different threshold values in order to rind an optimal anti-spam filter configuration. Based on cost-sensitive measures, we conclude that additional safety precautions are needed for a Bayesian anti-spam filter to be put into practice. However, our technique can make a positive contribution as a first pass filter.

引用

页码：333 / +

页数：2

共 13 条

[1] Androutsopoulos I, 2000, P 23 ANN INT ACM SIG
[2] [Anonymous], 2002, A plan for spam
[3] COHEN W, 1996, AAI SPRING S MACH LE
[4] DOMINIGOS P, 1996, P 13 INT C MACH LEAR
[5] Graham P., 2003, BETTER BAYESIAN FILT
[6] LANGLEY P, 1992, P 10 NAT C AI
[7] POTAMIAS G, 2000, P WORKSH MACH LEARN
[8] PROVOST J, AITR99284 U TEX
[9] Rennie Jason D. M., KDD 2000 TEXT MIN WO
[10] SAHAMI M, 1998, AAI WORKSH LEARN TEX

← 1 2 →