UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation

被引:0
作者
Li B. [1 ,2 ]
Tang S. [1 ]
Li W. [1 ,2 ]
机构
[1] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou
[2] School of Mechanical and Electronic Engineering, Suzhou University, Suzhou
关键词
attention mechanism; context enhancement; lightweight network; multi-branch structure; Pose estimation;
D O I
10.3233/JIFS-231440
中图分类号
学科分类号
摘要
Pose estimation plays a crucial role in human-centered vision applications and has advanced significantly in recent years. However, prevailing approaches use extremely complex structural designs for obtaining high scores on the benchmark dataset, hampering edge device applications. In this study, an efficient and lightweight human pose estimation problem is investigated. Enhancements are made to the context enhancement module of the U-shaped structure to improve the multi-scale local modeling capability. With a transformer structure, a lightweight transformer block was designed to enhance the local feature extraction and global modeling ability. Finally, a lightweight pose estimation network-U-shaped Hybrid Vision Transformer, UViT-was developed. The minimal network UViT-T achieved a 3.9% improvement in AP scores on the COCO validation set with fewer model parameters and computational complexity compared with the best-performing V2 version of the MobileNet series. Specifically, with an input size of 384×288, UViT-T achieves an impressive AP score of 70.2 on the COCO test-dev set, with only 1.52 M parameters and 2.32 GFLOPs. The inference speed is approximately twice that of general-purpose networks. This study provides an efficient and lightweight design idea and method for the human pose estimation task and provides theoretical support for its deployment on edge devices. © 2024-IOS Press. All rights reserved.
引用
收藏
页码:8345 / 8359
页数:14
相关论文
共 50 条
  • [21] A lightweight attention-driven distillation model for human pose estimation
    Wei, Falai
    Hu, Xiaofang
    PATTERN RECOGNITION LETTERS, 2024, 185 : 247 - 253
  • [22] Lightweight Human Pose Estimation Based on Heatmap Weighted Loss Function
    Wang, Xin
    Li, Guanhua
    Chen, Yongfeng
    Wen, Ge
    ENGINEERING LETTERS, 2024, 32 (11) : 2127 - 2137
  • [23] Asymmetric U-shaped network with hybrid attention mechanism for kidney ultrasound images segmentation
    Chen, Gong -Ping
    Zhao, Yu
    Dai, Yu
    Zhang, Jian-Xun
    Yin, Xiao-Tao
    Cui, Liang
    Qian, Jiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [24] Efficient High-Resolution Human Pose Estimation
    Qin, Xiaofei
    Qiu, Lingfeng
    He, Changxiang
    Zhang, Xuedian
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 383 - 396
  • [25] Lightweight 2D Human Pose Estimation for Fitness Coaching System
    Jeon, Hobeom
    Yoon, Youngwoo
    Kim, Dohyung
    2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
  • [26] Lightweight HRNet: A Ligtweight Network for Bottom-Up Human Pose Estimation
    Liao, Jinzhen
    Cui, Wenhua
    Tao, Ye
    Shi, Tianwei
    Shen, Lijia
    ENGINEERING LETTERS, 2024, 32 (03) : 661 - 670
  • [27] FALNet: flow-based attention lightweight network for human pose estimation
    Xiao, Degui
    Liu, Jiahui
    Li, Jiazhi
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
  • [28] Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention
    Li, Xin
    Guo, Yuxin
    Pan, Weiguo
    Liu, Hongzhe
    Xu, Bingxin
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [29] Hybrid Refinement-Correction Heatmaps for Human Pose Estimation
    Kamel, Aouaidjia
    Sheng, Bin
    Li, Ping
    Kim, Jinman
    Feng, David Dagan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1330 - 1342
  • [30] Dual-Path Transformer for 3D Human Pose Estimation
    Zhou, Lu
    Chen, Yingying
    Wang, Jinqiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3260 - 3270