ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention

被引:0
|
作者
Cai, Zengyu [1 ]
Xu, Liusen [2 ]
Zhang, Jianwei [2 ,3 ]
Feng, Yuan [4 ]
Zhu, Liang [1 ]
Liu, Fangmei [1 ]
机构
[1] Zhengzhou Univ Light Ind, Sch Comp Sci & Technol, Zhengzhou 450003, Peoples R China
[2] Zhengzhou Univ Light Ind, Sch Software Engn, Zhengzhou 450003, Peoples R China
[3] Zhengzhou Univ Light Ind, Res Inst Ind Technol, Zhengzhou 450003, Peoples R China
[4] Zhengzhou Univ Light Ind, Sch Elechon Informat, Zhengzhou 450003, Peoples R China
来源
ELECTRONIC RESEARCH ARCHIVE | 2024年 / 32卷 / 12期
基金
中国国家自然科学基金;
关键词
pornographic image classification; Vision Transformer; Convolutional Block Attention Module; Multi-Head Attention; Convolutional Neural Network;
D O I
10.3934/era.2024313
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw data scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% +/- 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms.
引用
收藏
页码:6698 / 6716
页数:19
相关论文
共 50 条
  • [1] Vision Transformer (ViT)-based Applications in Image Classification
    Huo, Yingzi
    Jin, Kai
    Cai, Jiahong
    Xiong, Huixuan
    Pang, Jiacheng
    2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 135 - 140
  • [2] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo, Dayeon
    Kim, Jeesu
    Yoo, Jinwoo
    IEEE ACCESS, 2024, 12 : 72598 - 72606
  • [3] MIL-ViT: A multiple instance vision transformer for fundus image classification
    Bi, Qi
    Sun, Xu
    Yu, Shuang
    Ma, Kai
    Bian, Cheng
    Ning, Munan
    He, Nanjun
    Huang, Yawen
    Li, Yuexiang
    Liu, Hanruo
    Zheng, Yefeng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 97
  • [4] Efficient image analysis with triple attention vision transformer
    Li, Gehui
    Zhao, Tongtong
    PATTERN RECOGNITION, 2024, 150
  • [5] Attention Head Interactive Dual Attention Transformer for Hyperspectral Image Classification
    Shi, Cuiping
    Yue, Shuheng
    Wang, Liguo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
  • [6] Efficient identification and classification of apple leaf diseases using lightweight vision transformer (ViT)
    Ullah, Wasi
    Javed, Kashif
    Khan, Muhammad Attique
    Alghayadh, Faisal Yousef
    Bhatt, Mohammed Wasim
    Al Naimi, Imad Saud
    Ofori, Isaac
    DISCOVER SUSTAINABILITY, 2024, 5 (01):
  • [7] SI-ViT: Shuffle instance-based Vision Transformer for pancreatic cancer ROSE image classification
    Zhang, Tianyi
    Feng, Youdan
    Zhao, Yu
    Lei, Yanli
    Ying, Nan
    Song, Fan
    He, Yufang
    Yan, Zhiling
    Feng, Yunlu
    Yang, Aiming
    Zhang, Guanglei
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 244
  • [8] P-Vit: A Simplified Vision Transformer Model Based on FFN and Simple Attention
    Hu, Wei
    Hu, Mingce
    Liu, Fang
    Han, Yi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT V, KSEM 2024, 2024, 14888 : 316 - 326
  • [9] Image Classification of Tree Species in Relatives Based on Dual-Branch Vision Transformer
    Wang, Qi
    Dong, Yanqi
    Xu, Nuo
    Xu, Fu
    Mou, Chao
    Chen, Feixiang
    FORESTS, 2024, 15 (12):
  • [10] Efficient Dual Attention Transformer for Image Super-Resolution
    Park, Soobin
    Jeong, Yuna
    Choi, Yong Suk
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 963 - 970