A NONNEGATIVE SPARSITY INDUCED SIMILARITY MEASURE WITH APPLICATION TO CLUSTER ANALYSIS OF SPAM IMAGES

被引:8
作者
Gao, Yan [1 ]
Choudhary, Alok [1 ]
Hua, Gang [2 ]
机构
[1] Northwestern Univ, Dept EECS, Evanston, IL 60208 USA
[2] Nokia Res Ctr, Los Angeles, CA 90009 USA
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Nonnegative sparse representation; Image spam filtering; Cluster analysis;
D O I
10.1109/ICASSP.2010.5495246
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Image spam is an email spam that embeds text content into graphical images to bypass traditional spam filters. The majority of previous approaches focus on filtering image spam from client side. To effectively detect the attack activities of the spammers and fast trace back the spam sources, it is also essential to employ cluster analysis to comprehensively filter the image emails on the server side. In this paper, we present a nonnegative sparsity induced similarity measure for cluster analysis of spam images. This similarity measure is based on an assumption that a spam image should be represented well by the nonnegative linear combination of a small number of spam images in the same cluster. It is due to the observation that spammers generate large number of varieties from a single image source with different image processing and manipulation techniques. Experiments on a spam image dataset collected from our department email server demonstrated the advantages of the proposed approach.
引用
收藏
页码:5594 / 5597
页数:4
相关论文
共 13 条
[1]  
[Anonymous], P IEEE INT C COMP VI
[2]  
[Anonymous], P 2007 ACM INT C INF
[3]  
[Anonymous], P 33 IEEE INT C AC S
[4]  
Benaroya L, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL VI, PROCEEDINGS, P613
[5]  
Carreras X., 2001, P 4 INT C RECENT ADV, P58
[6]  
Dredze M, 2007, P 4 C EM ANT CEAS CA
[7]   Support vector machines for spam categorization [J].
Drucker, H ;
Wu, DH ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054
[8]  
Gao Y, 2009, LECT NOTES COMPUT SC, V5678, P152
[9]  
MEHTA B, 2008, P 17 INT WORLD WID W
[10]  
Neumaier A., 1998, MINQ GEN DEFINITE BO