ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

被引：3

作者：

Abbas, Farhat ^{[1
]}

Yasmin, Mussarat ^{[1
]}

Fayyaz, Muhammad ^{[2
]}

Asim, Usman ^{[3
]}

机构：

[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, WahCantt 47040, Pakistan

[2] FAST Natl Univ Comp & Emerging Sci NUCES, Dept Comp Sci, Chiniot Faisalabad Campus, Chiniot, Punjab, Pakistan

[3] DeltaX, 3F,24,Namdaemun Ro 9 Gil, Seoul, South Korea

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2023年 / 26卷 / 04期

关键词：

Vision transformer; LSA and SPT; Deep CNN models; SS datasets; Pedestrian gender classification; CONVOLUTIONAL NEURAL-NETWORK; RECOGNITION;

D O I：

10.1007/s10044-023-01196-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.

引用

页码：1805 / 1819

页数：15

共 24 条

[11] Transforming Alzheimer's Disease Diagnosis: Implementing Vision Transformer (ViT) for MRI Images Classification
Kurniasari, Dian
Pratama, Muhammad Dwi
Junaidi, Akmal
Faisol, Ahmad
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2025, 24 (01): : 130 - 152
[12] MASK-VIT: AN OBJECT MASK EMBEDDING IN VISION TRANSFORMER FOR FINE-GRAINED VISUAL CLASSIFICATION
Su, Tong
Ye, Shuo
Song, Chengqun
Cheng, Jun
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1626 - 1630
[13] Pedestrian gender classification on imbalanced and small sample datasets using deep and traditional features
Fayyaz, Muhammad
Yasmin, Mussarat
Sharif, Muhammad
Iqbal, Tasswar
Raza, Mudassar
Babar, Muhammad Imran
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (16) : 11937 - 11968
[14] AnisotropicBreast-ViT: Breast Cancer Classification in Ultrasound Images Using Anisotropic Filtering and Vision Transformer
Diniz, Joao Otavio Bandeira
Ribeiro, Neilson P.
Dias, Domingos A., Jr.
da Cruz, Luana B.
da Silva, Giovanni L. F.
Gomes, Daniel L., Jr.
de Paiva, Anselmo C.
Silva, Aristofanes C.
INTELLIGENT SYSTEMS, BRACIS 2024, PT III, 2025, 15414 : 95 - 109
[15] SI-ViT: Shuffle instance-based Vision Transformer for pancreatic cancer ROSE image classification
Zhang, Tianyi
Feng, Youdan
Zhao, Yu
Lei, Yanli
Ying, Nan
Song, Fan
He, Yufang
Yan, Zhiling
Feng, Yunlu
Yang, Aiming
Zhang, Guanglei
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 244
[16] GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model
Venkatasaichandrakanth, P.
Iyapparaja, M.
PLOS ONE, 2024, 19 (03):
[17] Patient teacher can impart locality to improve lightweight vision transformer on small dataset
Ling, Jun
Zhang, Xuan
Du, Fei
Li, Linyu
Shang, Weiyi
Gao, Chen
Li, Tong
PATTERN RECOGNITION, 2025, 157
[18] ViT-DexiNet: a vision transformer-based edge detection operator for small object detection in SAR images
Sivapriya, M. S.
Suresh, S.
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (22) : 7057 - 7084
[19] TransMCGC: a recast vision transformer for small-scale image classification tasks
Jian-Wen Xiang
Min-Rong Chen
Pei-Shan Li
Hao-Li Zou
Shi-Da Li
Jun-Jie Huang
Neural Computing and Applications, 2023, 35 : 7697 - 7718
[20] TransMCGC: a recast vision transformer for small-scale image classification tasks
Xiang, Jian-Wen
Chen, Min-Rong
Li, Pei-Shan
Zou, Hao-Li
Li, Shi-Da
Huang, Jun-Jie
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (10) : 7697 - 7718

← 1 2 3 →