Enhancing Multimodal Clustering Framework with Deep Learning to Reveal Image Spam Authorship

被引：3

作者：

Chen, Wei-Bang ^{[1
]}

Lu, Yongjin ^{[2
]}

Ailsworth, Zanyah ^{[1
]}

Wang, Xiaoliang ^{[3
]}

Zhang, Chengcui ^{[4
]}

机构：

[1] Virginia State Univ, Dept Comp Sci, Petersburg, VA 23806 USA

[2] Virginia State Univ, Dept Math & Econ, Petersburg, VA 23806 USA

[3] Virginia State Univ, Dept Appl Engn Technol, Petersburg, VA 23806 USA

[4] Univ Alabama Birmingham, Dept Comp Sci, Birmingham, AL USA

来源：

2021 IEEE 22ND INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2021) | 2021年

关键词：

image spam; clustering; multimodal analysis; botnet; convolutional neural networks (CNNs);

D O I：

10.1109/IRI51335.2021.00032

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces a multimodal framework for clustering spam images received in unsolicited emails. Spam images in the same cluster have similar visual and textual contents and could be generated by a common spam source. To perform the clustering task, we first extract three main categories of features: 1) Visual features, extracted by pretrained convolutional neural networks (CNNs); 2) Layout features, the location of illustrations in the spam images; 3) Text features extracted by optical character recognition (OCR) algorithm. We then use a two-stage hierarchical clustering framework to form clusters based on the pair-wise similarity matrices of the extracted features. We evaluate the performance of the proposed approach on a 2,100 spam image dataset collected from three months of emails. The experimental results show that the proposed method achieved satisfactory clustering outcomes in terms of an external entropy-based metric, the V-measure.

引用

页码：193 / 200

页数：8

共 21 条

[1] Malicious Spam Emails Developments and Authorship Attribution
Alazab, Mamoun
Layton, Robert
Broadhurst, Roderic
Bouhours, Brigitte
[J]. 2013 FOURTH CYBERCRIME AND TRUSTWORTHY COMPUTING WORKSHOP (CTC 2013), 2014, : 58 - +
[2] [Anonymous], 2015, P INT C LEARN REPR
[3] [Anonymous], 2011, PROC IEEE VEH TECHNO, DOI DOI 10.1109/VETECF.2011.6092961
[4] Carreras X., 2001, P 4 INT C RECENT ADV, P58
[5] Chengcui Zhang, 2009, Journal of Multimedia, V4, P313
[6] Chih-Chin Lai, 2004, Fourth International Conference on Hybrid Intelligent Systems, P44, DOI 10.1109/ICHIS.2004.21
[7] Clark J, 2003, IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, P702
[8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9] Support vector machines for spam categorization
Drucker, H
Wu, DH
Vapnik, VN
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05): : 1048 - 1054
[10] Gao Y, 2008, INT CONF ACOUST SPEE, P1765

← 1 2 3 →