UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation

被引：0

作者：

Li B. ^{[1
,2
]}

Tang S. ^{[1
]}

Li W. ^{[1
,2
]}

机构：

[1] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou

[2] School of Mechanical and Electronic Engineering, Suzhou University, Suzhou

来源：

Journal of Intelligent and Fuzzy Systems | 2024年 / 46卷 / 04期

关键词：

attention mechanism; context enhancement; lightweight network; multi-branch structure; Pose estimation;

D O I：

10.3233/JIFS-231440

中图分类号：

学科分类号：

摘要：

Pose estimation plays a crucial role in human-centered vision applications and has advanced significantly in recent years. However, prevailing approaches use extremely complex structural designs for obtaining high scores on the benchmark dataset, hampering edge device applications. In this study, an efficient and lightweight human pose estimation problem is investigated. Enhancements are made to the context enhancement module of the U-shaped structure to improve the multi-scale local modeling capability. With a transformer structure, a lightweight transformer block was designed to enhance the local feature extraction and global modeling ability. Finally, a lightweight pose estimation network-U-shaped Hybrid Vision Transformer, UViT-was developed. The minimal network UViT-T achieved a 3.9% improvement in AP scores on the COCO validation set with fewer model parameters and computational complexity compared with the best-performing V2 version of the MobileNet series. Specifically, with an input size of 384×288, UViT-T achieves an impressive AP score of 70.2 on the COCO test-dev set, with only 1.52 M parameters and 2.32 GFLOPs. The inference speed is approximately twice that of general-purpose networks. This study provides an efficient and lightweight design idea and method for the human pose estimation task and provides theoretical support for its deployment on edge devices. © 2024-IOS Press. All rights reserved.

引用

页码：8345 / 8359

页数：14

共 50 条

[21] A lightweight attention-driven distillation model for human pose estimation
Wei, Falai
Hu, Xiaofang
PATTERN RECOGNITION LETTERS, 2024, 185 : 247 - 253
[22] Lightweight Human Pose Estimation Based on Heatmap Weighted Loss Function
Wang, Xin
Li, Guanhua
Chen, Yongfeng
Wen, Ge
ENGINEERING LETTERS, 2024, 32 (11) : 2127 - 2137
[23] Asymmetric U-shaped network with hybrid attention mechanism for kidney ultrasound images segmentation
Chen, Gong -Ping
Zhao, Yu
Dai, Yu
Zhang, Jian-Xun
Yin, Xiao-Tao
Cui, Liang
Qian, Jiang
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
[24] Efficient High-Resolution Human Pose Estimation
Qin, Xiaofei
Qiu, Lingfeng
He, Changxiang
Zhang, Xuedian
PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 383 - 396
[25] Lightweight 2D Human Pose Estimation for Fitness Coaching System
Jeon, Hobeom
Yoon, Youngwoo
Kim, Dohyung
2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
[26] Lightweight HRNet: A Ligtweight Network for Bottom-Up Human Pose Estimation
Liao, Jinzhen
Cui, Wenhua
Tao, Ye
Shi, Tianwei
Shen, Lijia
ENGINEERING LETTERS, 2024, 32 (03) : 661 - 670
[27] FALNet: flow-based attention lightweight network for human pose estimation
Xiao, Degui
Liu, Jiahui
Li, Jiazhi
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
[28] Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention
Li, Xin
Guo, Yuxin
Pan, Weiguo
Liu, Hongzhe
Xu, Bingxin
APPLIED SCIENCES-BASEL, 2023, 13 (06):
[29] Hybrid Refinement-Correction Heatmaps for Human Pose Estimation
Kamel, Aouaidjia
Sheng, Bin
Li, Ping
Kim, Jinman
Feng, David Dagan
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1330 - 1342
[30] Dual-Path Transformer for 3D Human Pose Estimation
Zhou, Lu
Chen, Yingying
Wang, Jinqiao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3260 - 3270

← 1 2 3 4 5 →