A Local-Global Estimator Based on Large Kernel CNN and Transformer for Human Pose Estimation and Running Pose Measurement

被引:1
|
作者
Wu, Qingtian [1 ]
Wu, Yongfei [2 ]
Zhang, Yu [1 ,3 ]
Zhang, Liming
机构
[1] Univ Macau, Fac Sci & Technol, Macau, Peoples R China
[2] Taiyuan Univ Technol, Coll Data Sci, Taiyuan 030024, Peoples R China
[3] Shenyang Univ Chem Technol, Comp Sci & Technol Coll, Shenyang 110142, Peoples R China
关键词
Transformers; Pose estimation; Convolutional neural networks; Feature extraction; Visualization; Task analysis; Kernel; Convolutional neural networks (CNN); human pose estimation (HPE); local(-)global estimator; running pose measurement; vision transformer (ViT);
D O I
10.1109/TIM.2022.3200438
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Running pose in the crowd can serve as an early warning of most abnormal events (e.g., chasing, fleeing, and robbing), which can be achieved by human behavior analysis based on human pose measurement. Although deep convolutional neural networks (CNNs) have achieved impressive progress on human pose estimation (HPE), how to further improve the trade-off between estimation accuracy and speed remains an open issue. In this work, we first propose an efficient local-global estimator for HPE (called LGPose). Then based on the keypoints estimated by our LGPose, a simple regression model is defined using the geometry of the joints to achieve fast and accurate running pose measurement. To model the relationships between the human keypoints, a visual transformer (ViT) encoder is adopted to learn the long-range interdependencies between them at the pixel level. However, the operation of the transformer encoder is based on sequence processing that linearly projects the 2-D image patches to 1-D tokens. It loses the important local information. Yet, locality is crucial since it has relevance to lines, edges, and shapes. To learn the locality, we design effective CNN modules, rather than the original fully-connected network (FCN), into the feedforward module of ViT. Experiments on the MPII and COCO Keypoint val2017 datasets show that the proposed LGPose achieves the best trade-off among the compared state-of-the-art methods. Moreover, we build a lightweight running movement dataset to verify the effectiveness of our LGPose. Based on the human pose estimated by our LGPose, we propose a regression model to measure running pose with an accuracy of 86.4% without training any other classifier. Our source codes and running dataset will be made publicly available.
引用
收藏
页数:12
相关论文
共 10 条
  • [1] Landslide Susceptibility Mapping Considering Landslide Local-Global Features Based on CNN and Transformer
    Zhao, Zeyang
    Chen, Tao
    Dou, Jie
    Liu, Gang
    Plaza, Antonio
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 7475 - 7489
  • [2] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
    Wang, Yong
    Kang, Hongbo
    Wu, Doudou
    Yang, Wenming
    Zhang, Longbin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049
  • [3] Fast CNN-Based Single-Person 2D Human Pose Estimation for Autonomous Systems
    Papaioannidis, Christos
    Mademlis, Ioannis
    Pitas, Ioannis
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1262 - 1275
  • [4] GLPose: Global-Local Representation Learning for Human Pose Estimation
    Jiao, Yingying
    Chen, Haipeng
    Feng, Runyang
    Chen, Haoming
    Wu, Sifan
    Yin, Yifang
    Liu, Zhenguang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [5] Pose Estimation of Robot End-Effector using a CNN-Based Cascade Estimator
    Ortega, Kevin D.
    Sepulveda, Jorge I.
    Hernandez, Byron
    Holguin, German A.
    Medeiros, Henry
    2023 IEEE 6TH COLOMBIAN CONFERENCE ON AUTOMATIC CONTROL, CCAC, 2023, : 85 - 90
  • [6] A local-global coupled-layer puppet model for robust online human pose tracking
    Ma, Miao
    Marturi, Naresh
    Li, Yibin
    Stolkin, Rustam
    Leonardis, Ales
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 153 : 163 - 178
  • [7] Correlation ICP algorithm for pose estimation based on local and global features
    Chavarria, Marco A.
    Sommer, Gerald
    VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 528 - 534
  • [8] Regression-Based Camera Pose Estimation through Multi-Level Local Features and Global Features
    Xu, Meng
    Zhang, Zhihuang
    Gong, Yuanhao
    Poslad, Stefan
    SENSORS, 2023, 23 (08)
  • [9] MDST: 2-D Human Pose Estimation for SISO UWB Radar Based on Micro-Doppler Signature via Cascade and Parallel Swin Transformer
    Zhou, Xiaolong
    Jin, Tian
    Dai, Yongpeng
    Song, Yongping
    Li, Kemeng
    Song, Shaoqiu
    IEEE SENSORS JOURNAL, 2024, 24 (13) : 21730 - 21749
  • [10] InfPose: Real-Time Infrared Multi-Human Pose Estimation for Edge Devices Based on Encoder-Decoder CNN Architecture
    Xu, Xin
    Wei, Xinchao
    Xu, Yuelei
    Zhang, Zhaoxiang
    Gong, Kun
    Li, Huafeng
    Xiao, Leibing
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3672 - 3679