Yolov5-Based Attention Mechanism for Gesture Recognition in Complex Environment

被引：0

作者：

Khare, Deepak Kumar ^{[1
]}

Bhagat, Amit ^{[1
]}

Priya, R. Vishnu ^{[2
]}

Nag, Prashant Kumar ^{[1
]}

Malviya, Sunil ^{[1
]}

机构：

[1] Maulana Azad Natl Inst Technol, Dept Math Bioinformat & Comp Applicat, Bhopal, Madhya Pradesh, India

[2] Natl Inst Technol, Dept Comp Applicat, Tiruchirappalli, Tamil Nadu, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2024年 / 15卷 / 11期

关键词：

-Gesture recognition; Yolov5; object detection; attention mechanism; bidirectional feature pyramid;

D O I：

10.14569/IJACSA.2024.0151167

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

-Object detection is a fundamental task in gesture recognition, involving identifying and localising human hand or body gestures within images or videos amidst varying environmental conditions. To address the inadequate recognition rate of gesture detection algorithms in intricate surroundings caused by issues such as inconsistent illumination, background colors resembling skin tones, and diminutive gesture scales, a gesture recognition approach termed HD-YOLOv5s is presented. An adaptive Gamma image enhancement preprocessing technique grounded in Retinex theory is employed to mitigate the effects of lighting variations on gesture recognition efficacy. A feature extraction network incorporating an adaptive convolutional attention mechanism (SKNet) is developed to augment the network's feature extraction efficacy and mitigate background interference in intricate situations. A novel bidirectional feature pyramid architecture is implemented in the feature fusion network to fully leverage low-level features, thereby minimizing the loss of shallow semantic information and enhancing the detection accuracy of small-scale gestures. A cross-level connection strategy is employed to enhance the model's detection efficiency. To assess the efficacy of the suggested technique, experiments were performed on a custom dataset featuring diverse lighting intensity fluctuations and the publicly available NUS-II dataset with intricate backdrops. The recognition rates attained were 99.5% and 98.9%, respectively, with a detection time per frame of about 0.01 to 0.02 seconds.

引用

页码：699 / 711

页数：13

共 30 条

[1] Learning by doing: A dual-loop implementation architecture of deep active learning and human-machine collaboration for smart robot vision [J].

Deng, Wupeng ;

Liu, Quan ;

Zhao, Feifan ;

Pham, Duc Truong ;

Hu, Jiwei ;

Wang, Yongjing ;

Zhou, Zude .

ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2024, 86

[2] Explainable federated learning for privacy-preserving bangla sign language detection [J].

Diba, Bidita Sarkar ;

Plabon, Jayonto Dutta ;

Rahman, M. D. Mahmudur ;

Mistry, Durjoy ;

Saha, Aloke Kumar ;

Mridha, M. F. .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 134

[3] Real-time and accurate meal detection for meal-assisting robots [J].

Fan, Yuhe ;

Zhang, Lixun ;

Zheng, Canxing ;

Zu, Yunqin ;

Wang, Xingyuan ;

Zhu, Jinghui .

JOURNAL OF FOOD ENGINEERING, 2024, 371

[4] Human intention and workspace recognition for collaborative assembly [J].

Gajjar, Nishant Ketan ;

Rekik, Khansa ;

Kanso, Ali ;

Mueller, Rainer .

IFAC PAPERSONLINE, 2022, 55 (10) :365-370

[5] A vision-based dietary survey and assessment system for college students in China [J].

Gao, Zicheng ;

Yuan, Xufeng ;

Lei, Jie ;

Guo, Hao ;

Marinello, Francesco ;

Guerrini, Lorenzo ;

Carraro, Alberto .

FOOD CHEMISTRY, 2025, 464

[6] The video-based safety methodology for pedestrian crosswalk safety measured: The case of Thammasat University, Thailand [J].

Hnoohom, Narit ;

Chotivatunyu, Pitchaya ;

Maitrichit, Nagorn ;

Nilsumrit, Chayawat ;

Iamtrakul, Pawinee .

TRANSPORTATION RESEARCH INTERDISCIPLINARY PERSPECTIVES, 2024, 24

[7] Automatic Defect Detection in Sewer Pipe Closed- Circuit Television Images via Improved You Only Look Once Version 5 Object Detection Network [J].

Huang, Jianying ;

Kang, Hoon .

IEEE ACCESS, 2024, 12 :92797-92825

[8] Hardware acceleration of Tiny YOLO deep neural networks for sign language recognition: A comprehensive performance analysis [J].

Jaiswal, Mohita ;

Sharma, Abhishek ;

Saini, Sandeep .

INTEGRATION-THE VLSI JOURNAL, 2025, 100

[9] Next-Generation swimming pool drowning prevention strategy integrating AI and IoT technologies [J].

Kao, Wei-Chun ;

Fan, Yi-Ling ;

Hsu, Fang-Rong ;

Shen, Chien-Yu ;

Liao, Lun-De .

HELIYON, 2024, 10 (18)

[10] Diving deep into human action recognition in aerial videos: A survey [J].

Kapoor, Surbhi ;

Sharma, Akashdeep ;

Verma, Amandeep .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104

← 1 2 3 →