Collective Classification for Spam Filtering

被引:0
作者
Laorden, Carlos [1 ]
Sanz, Borja [1 ]
Santos, Igor [1 ]
Galan-Garcia, Patxi [1 ]
Bringas, Pablo G. [1 ]
机构
[1] Univ Deusto, DeustoTech Comp S3Lab, Bilbao 48007, Spain
来源
COMPUTATIONAL INTELLIGENCE IN SECURITY FOR INFORMATION SYSTEMS | 2011年 / 6694卷
关键词
Spam filtering; collective classification; semi-supervised learning; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. Many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods require a training step with labelled data. Dealing with the situation where the availability of labelled training instances is limited slows down the progress of filtering systems and offers advantages to spammers. Currently, many approaches direct their efforts into Semi-Supervised Learning (SSL). SSL is a halfway method between supervised and unsupervised learning, which, in addition to unlabelled data, receives some supervision information such as the association of the targets with some of the examples. Collective Classification for Text Classification poses as an interesting method for optimising the classification of partially-labelled data. In this way, we propose here, for the first time, Collective Classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 20 条
  • [1] Androutsopoulos I., 2000, P EUR C MACH LEARN, P9
  • [2] [Anonymous], 1995, P NZ COMPUTER SCI RE
  • [3] Burton B., 2003, P SPAM C
  • [4] Chirita P.A., 2005, Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM '05, P373
  • [5] Chiu YF, 2007, ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, P203
  • [6] Dengel A., 1995, Proceedings of the Third International Conference on Document Analysis and Recognition, P587, DOI 10.1109/ICDAR.1995.601965
  • [7] Bayesian network model for semi-structured document classification
    Denoyer, L
    Gallinari, P
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (05) : 807 - 827
  • [8] SEGMENTATION METHODS FOR CHARACTER-RECOGNITION - FROM SEGMENTATION TO DOCUMENT STRUCTURE-ANALYSIS
    FUJISAWA, H
    NAKANO, Y
    KURINO, K
    [J]. PROCEEDINGS OF THE IEEE, 1992, 80 (07) : 1079 - 1092
  • [9] Holmes G., 1994, Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems (Cat. No.94TH8019), P357, DOI 10.1109/ANZIIS.1994.396988
  • [10] KENT JT, 1983, BIOMETRIKA, V70, P163, DOI 10.1093/biomet/70.1.163