Enhancing Multimodal Clustering Framework with Deep Learning to Reveal Image Spam Authorship

被引:3
作者
Chen, Wei-Bang [1 ]
Lu, Yongjin [2 ]
Ailsworth, Zanyah [1 ]
Wang, Xiaoliang [3 ]
Zhang, Chengcui [4 ]
机构
[1] Virginia State Univ, Dept Comp Sci, Petersburg, VA 23806 USA
[2] Virginia State Univ, Dept Math & Econ, Petersburg, VA 23806 USA
[3] Virginia State Univ, Dept Appl Engn Technol, Petersburg, VA 23806 USA
[4] Univ Alabama Birmingham, Dept Comp Sci, Birmingham, AL USA
来源
2021 IEEE 22ND INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2021) | 2021年
关键词
image spam; clustering; multimodal analysis; botnet; convolutional neural networks (CNNs);
D O I
10.1109/IRI51335.2021.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a multimodal framework for clustering spam images received in unsolicited emails. Spam images in the same cluster have similar visual and textual contents and could be generated by a common spam source. To perform the clustering task, we first extract three main categories of features: 1) Visual features, extracted by pretrained convolutional neural networks (CNNs); 2) Layout features, the location of illustrations in the spam images; 3) Text features extracted by optical character recognition (OCR) algorithm. We then use a two-stage hierarchical clustering framework to form clusters based on the pair-wise similarity matrices of the extracted features. We evaluate the performance of the proposed approach on a 2,100 spam image dataset collected from three months of emails. The experimental results show that the proposed method achieved satisfactory clustering outcomes in terms of an external entropy-based metric, the V-measure.
引用
收藏
页码:193 / 200
页数:8
相关论文
共 21 条
  • [1] Malicious Spam Emails Developments and Authorship Attribution
    Alazab, Mamoun
    Layton, Robert
    Broadhurst, Roderic
    Bouhours, Brigitte
    [J]. 2013 FOURTH CYBERCRIME AND TRUSTWORTHY COMPUTING WORKSHOP (CTC 2013), 2014, : 58 - +
  • [2] [Anonymous], 2015, P INT C LEARN REPR
  • [3] [Anonymous], 2011, PROC IEEE VEH TECHNO, DOI DOI 10.1109/VETECF.2011.6092961
  • [4] Carreras X., 2001, P 4 INT C RECENT ADV, P58
  • [5] Chengcui Zhang, 2009, Journal of Multimedia, V4, P313
  • [6] Chih-Chin Lai, 2004, Fourth International Conference on Hybrid Intelligent Systems, P44, DOI 10.1109/ICHIS.2004.21
  • [7] Clark J, 2003, IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, P702
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] Support vector machines for spam categorization
    Drucker, H
    Wu, DH
    Vapnik, VN
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05): : 1048 - 1054
  • [10] Gao Y, 2008, INT CONF ACOUST SPEE, P1765