ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention

被引：0

作者：

Cai, Zengyu ^{[1
]}

Xu, Liusen ^{[2
]}

Zhang, Jianwei ^{[2
,3
]}

Feng, Yuan ^{[4
]}

Zhu, Liang ^{[1
]}

Liu, Fangmei ^{[1
]}

机构：

[1] Zhengzhou Univ Light Ind, Sch Comp Sci & Technol, Zhengzhou 450003, Peoples R China

[2] Zhengzhou Univ Light Ind, Sch Software Engn, Zhengzhou 450003, Peoples R China

[3] Zhengzhou Univ Light Ind, Res Inst Ind Technol, Zhengzhou 450003, Peoples R China

[4] Zhengzhou Univ Light Ind, Sch Elechon Informat, Zhengzhou 450003, Peoples R China

来源：

ELECTRONIC RESEARCH ARCHIVE | 2024年 / 32卷 / 12期

基金：

中国国家自然科学基金;

关键词：

pornographic image classification; Vision Transformer; Convolutional Block Attention Module; Multi-Head Attention; Convolutional Neural Network;

D O I：

10.3934/era.2024313

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw data scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% +/- 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms.

引用

页码：6698 / 6716

页数：19

共 50 条

[1] Vision Transformer (ViT)-based Applications in Image Classification
Huo, Yingzi
Jin, Kai
Cai, Jiahong
Xiong, Huixuan
Pang, Jiacheng
2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 135 - 140
[2] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
Yoo, Dayeon
Kim, Jeesu
Yoo, Jinwoo
IEEE ACCESS, 2024, 12 : 72598 - 72606
[3] MIL-ViT: A multiple instance vision transformer for fundus image classification
Bi, Qi
Sun, Xu
Yu, Shuang
Ma, Kai
Bian, Cheng
Ning, Munan
He, Nanjun
Huang, Yawen
Li, Yuexiang
Liu, Hanruo
Zheng, Yefeng
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 97
[4] Efficient image analysis with triple attention vision transformer
Li, Gehui
Zhao, Tongtong
PATTERN RECOGNITION, 2024, 150
[5] Attention Head Interactive Dual Attention Transformer for Hyperspectral Image Classification
Shi, Cuiping
Yue, Shuheng
Wang, Liguo
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
[6] Efficient identification and classification of apple leaf diseases using lightweight vision transformer (ViT)
Ullah, Wasi
Javed, Kashif
Khan, Muhammad Attique
Alghayadh, Faisal Yousef
Bhatt, Mohammed Wasim
Al Naimi, Imad Saud
Ofori, Isaac
DISCOVER SUSTAINABILITY, 2024, 5 (01):
[7] SI-ViT: Shuffle instance-based Vision Transformer for pancreatic cancer ROSE image classification
Zhang, Tianyi
Feng, Youdan
Zhao, Yu
Lei, Yanli
Ying, Nan
Song, Fan
He, Yufang
Yan, Zhiling
Feng, Yunlu
Yang, Aiming
Zhang, Guanglei
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 244
[8] P-Vit: A Simplified Vision Transformer Model Based on FFN and Simple Attention
Hu, Wei
Hu, Mingce
Liu, Fang
Han, Yi
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT V, KSEM 2024, 2024, 14888 : 316 - 326
[9] Image Classification of Tree Species in Relatives Based on Dual-Branch Vision Transformer
Wang, Qi
Dong, Yanqi
Xu, Nuo
Xu, Fu
Mou, Chao
Chen, Feixiang
FORESTS, 2024, 15 (12):
[10] Efficient Dual Attention Transformer for Image Super-Resolution
Park, Soobin
Jeong, Yuna
Choi, Yong Suk
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 963 - 970

← 1 2 3 4 5 →