ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

被引:3
|
作者
Abbas, Farhat [1 ]
Yasmin, Mussarat [1 ]
Fayyaz, Muhammad [2 ]
Asim, Usman [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, WahCantt 47040, Pakistan
[2] FAST Natl Univ Comp & Emerging Sci NUCES, Dept Comp Sci, Chiniot Faisalabad Campus, Chiniot, Punjab, Pakistan
[3] DeltaX, 3F,24,Namdaemun Ro 9 Gil, Seoul, South Korea
关键词
Vision transformer; LSA and SPT; Deep CNN models; SS datasets; Pedestrian gender classification; CONVOLUTIONAL NEURAL-NETWORK; RECOGNITION;
D O I
10.1007/s10044-023-01196-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.
引用
收藏
页码:1805 / 1819
页数:15
相关论文
共 24 条
  • [11] Transforming Alzheimer's Disease Diagnosis: Implementing Vision Transformer (ViT) for MRI Images Classification
    Kurniasari, Dian
    Pratama, Muhammad Dwi
    Junaidi, Akmal
    Faisol, Ahmad
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2025, 24 (01): : 130 - 152
  • [12] MASK-VIT: AN OBJECT MASK EMBEDDING IN VISION TRANSFORMER FOR FINE-GRAINED VISUAL CLASSIFICATION
    Su, Tong
    Ye, Shuo
    Song, Chengqun
    Cheng, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1626 - 1630
  • [13] Pedestrian gender classification on imbalanced and small sample datasets using deep and traditional features
    Fayyaz, Muhammad
    Yasmin, Mussarat
    Sharif, Muhammad
    Iqbal, Tasswar
    Raza, Mudassar
    Babar, Muhammad Imran
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (16) : 11937 - 11968
  • [14] AnisotropicBreast-ViT: Breast Cancer Classification in Ultrasound Images Using Anisotropic Filtering and Vision Transformer
    Diniz, Joao Otavio Bandeira
    Ribeiro, Neilson P.
    Dias, Domingos A., Jr.
    da Cruz, Luana B.
    da Silva, Giovanni L. F.
    Gomes, Daniel L., Jr.
    de Paiva, Anselmo C.
    Silva, Aristofanes C.
    INTELLIGENT SYSTEMS, BRACIS 2024, PT III, 2025, 15414 : 95 - 109
  • [15] SI-ViT: Shuffle instance-based Vision Transformer for pancreatic cancer ROSE image classification
    Zhang, Tianyi
    Feng, Youdan
    Zhao, Yu
    Lei, Yanli
    Ying, Nan
    Song, Fan
    He, Yufang
    Yan, Zhiling
    Feng, Yunlu
    Yang, Aiming
    Zhang, Guanglei
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 244
  • [16] GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model
    Venkatasaichandrakanth, P.
    Iyapparaja, M.
    PLOS ONE, 2024, 19 (03):
  • [17] Patient teacher can impart locality to improve lightweight vision transformer on small dataset
    Ling, Jun
    Zhang, Xuan
    Du, Fei
    Li, Linyu
    Shang, Weiyi
    Gao, Chen
    Li, Tong
    PATTERN RECOGNITION, 2025, 157
  • [18] ViT-DexiNet: a vision transformer-based edge detection operator for small object detection in SAR images
    Sivapriya, M. S.
    Suresh, S.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (22) : 7057 - 7084
  • [19] TransMCGC: a recast vision transformer for small-scale image classification tasks
    Jian-Wen Xiang
    Min-Rong Chen
    Pei-Shan Li
    Hao-Li Zou
    Shi-Da Li
    Jun-Jie Huang
    Neural Computing and Applications, 2023, 35 : 7697 - 7718
  • [20] TransMCGC: a recast vision transformer for small-scale image classification tasks
    Xiang, Jian-Wen
    Chen, Min-Rong
    Li, Pei-Shan
    Zou, Hao-Li
    Li, Shi-Da
    Huang, Jun-Jie
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (10) : 7697 - 7718