Spam filtering using statistical data compression models

被引：0

作者：

Department of Intelligent Systems, Jožef Stefan Institute, Jamova 39, Ljubljana, SI-1000, Slovenia ^{[1
]}

不详 ^{[2
]}

不详 ^{[3
]}

机构：

来源：

J. Mach. Learn. Res. | 2006年 / 2673-2698期

关键词：

Adaptive filtering - Classification (of information) - Data compression - Electronic mail - Learning algorithms - Markov processes - Text processing;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on character-level or binary sequences. By modeling messages as sequences, tokenization and other error-prone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.

引用

共 50 条

[31] Online Spam Filtering Using Support Vector Machines
Amayri, Ola
Bouguila, Nizar
ISCC: 2009 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, 2009, : 337 - 340
[32] Adaptive spam filtering using dynamic feature spaces
Zhou, Yan
Mulekar, Madhuri S.
Nerellapalli, Praveen
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (04) : 627 - 646
[33] Using LPP and LS-SVM For Spam Filtering
Sun, Xia
Zhang, Qingzhou
Wang, Ziqiang
2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL II, 2009, : 451 - 454
[34] Using Live Spam Beater (LiSB) Framework for Spam Filtering during SMTP Transactions
Gomez-Meire, Silvana
Gabriel Marquez, Cesar
Patricia Aray-Cappello, Eliana
Mendez, Jose R.
APPLIED SCIENCES-BASEL, 2022, 12 (20):
[35] Image spam filtering using convolutional neural networks
Fan Aiwan
Yang Zhaofeng
PERSONAL AND UBIQUITOUS COMPUTING, 2018, 22 (5-6) : 1029 - 1037
[36] Using visual features for anti-SPAM filtering
Wu, CT
Cheng, KT
Zhu, Q
Wu, KL
2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 2925 - 2928
[37] Ending Spam-Bayesian Content Filtering and the Art of Statistical Language Classification
Webster, Craig S.
PROMETHEUS, 2006, 24 (01) : 121 - 124
[38] Image spam filtering using convolutional neural networks
Fan Aiwan
Yang Zhaofeng
Personal and Ubiquitous Computing, 2018, 22 : 1029 - 1037
[39] Efficient spam email filtering using adaptive ontology
Youn, Seongwook
McLeod, Dennis
INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 249 - +
[40] PSSF: A novel statistical approach for personalized service-side spam filtering
Junejo, Khurum Nazir
Karim, Asim
PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 228 - 234

← 1 2 3 4 5 →