Spam filtering using statistical data compression models

被引:0
|
作者
Department of Intelligent Systems, Jožef Stefan Institute, Jamova 39, Ljubljana, SI-1000, Slovenia [1 ]
不详 [2 ]
不详 [3 ]
机构
来源
J. Mach. Learn. Res. | 2006年 / 2673-2698期
关键词
Adaptive filtering - Classification (of information) - Data compression - Electronic mail - Learning algorithms - Markov processes - Text processing;
D O I
暂无
中图分类号
学科分类号
摘要
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on character-level or binary sequences. By modeling messages as sequences, tokenization and other error-prone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.
引用
收藏
相关论文
共 50 条
  • [31] Online Spam Filtering Using Support Vector Machines
    Amayri, Ola
    Bouguila, Nizar
    ISCC: 2009 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, 2009, : 337 - 340
  • [32] Adaptive spam filtering using dynamic feature spaces
    Zhou, Yan
    Mulekar, Madhuri S.
    Nerellapalli, Praveen
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (04) : 627 - 646
  • [33] Using LPP and LS-SVM For Spam Filtering
    Sun, Xia
    Zhang, Qingzhou
    Wang, Ziqiang
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL II, 2009, : 451 - 454
  • [34] Using Live Spam Beater (LiSB) Framework for Spam Filtering during SMTP Transactions
    Gomez-Meire, Silvana
    Gabriel Marquez, Cesar
    Patricia Aray-Cappello, Eliana
    Mendez, Jose R.
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [35] Image spam filtering using convolutional neural networks
    Fan Aiwan
    Yang Zhaofeng
    PERSONAL AND UBIQUITOUS COMPUTING, 2018, 22 (5-6) : 1029 - 1037
  • [36] Using visual features for anti-SPAM filtering
    Wu, CT
    Cheng, KT
    Zhu, Q
    Wu, KL
    2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 2925 - 2928
  • [37] Ending Spam-Bayesian Content Filtering and the Art of Statistical Language Classification
    Webster, Craig S.
    PROMETHEUS, 2006, 24 (01) : 121 - 124
  • [38] Image spam filtering using convolutional neural networks
    Fan Aiwan
    Yang Zhaofeng
    Personal and Ubiquitous Computing, 2018, 22 : 1029 - 1037
  • [39] Efficient spam email filtering using adaptive ontology
    Youn, Seongwook
    McLeod, Dennis
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 249 - +
  • [40] PSSF: A novel statistical approach for personalized service-side spam filtering
    Junejo, Khurum Nazir
    Karim, Asim
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 228 - 234