A Local-Global Estimator Based on Large Kernel CNN and Transformer for Human Pose Estimation and Running Pose Measurement

被引：1

作者：

Wu, Qingtian ^{[1
]}

Wu, Yongfei ^{[2
]}

Zhang, Yu ^{[1
,3
]}

Zhang, Liming

机构：

[1] Univ Macau, Fac Sci & Technol, Macau, Peoples R China

[2] Taiyuan Univ Technol, Coll Data Sci, Taiyuan 030024, Peoples R China

[3] Shenyang Univ Chem Technol, Comp Sci & Technol Coll, Shenyang 110142, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2022年 / 71卷

关键词：

Transformers; Pose estimation; Convolutional neural networks; Feature extraction; Visualization; Task analysis; Kernel; Convolutional neural networks (CNN); human pose estimation (HPE); local(-)global estimator; running pose measurement; vision transformer (ViT);

D O I：

10.1109/TIM.2022.3200438

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Running pose in the crowd can serve as an early warning of most abnormal events (e.g., chasing, fleeing, and robbing), which can be achieved by human behavior analysis based on human pose measurement. Although deep convolutional neural networks (CNNs) have achieved impressive progress on human pose estimation (HPE), how to further improve the trade-off between estimation accuracy and speed remains an open issue. In this work, we first propose an efficient local-global estimator for HPE (called LGPose). Then based on the keypoints estimated by our LGPose, a simple regression model is defined using the geometry of the joints to achieve fast and accurate running pose measurement. To model the relationships between the human keypoints, a visual transformer (ViT) encoder is adopted to learn the long-range interdependencies between them at the pixel level. However, the operation of the transformer encoder is based on sequence processing that linearly projects the 2-D image patches to 1-D tokens. It loses the important local information. Yet, locality is crucial since it has relevance to lines, edges, and shapes. To learn the locality, we design effective CNN modules, rather than the original fully-connected network (FCN), into the feedforward module of ViT. Experiments on the MPII and COCO Keypoint val2017 datasets show that the proposed LGPose achieves the best trade-off among the compared state-of-the-art methods. Moreover, we build a lightweight running movement dataset to verify the effectiveness of our LGPose. Based on the human pose estimated by our LGPose, we propose a regression model to measure running pose with an accuracy of 86.4% without training any other classifier. Our source codes and running dataset will be made publicly available.

引用

页数：12

共 10 条

[1] Landslide Susceptibility Mapping Considering Landslide Local-Global Features Based on CNN and Transformer
Zhao, Zeyang
Chen, Tao
Dou, Jie
Liu, Gang
Plaza, Antonio
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 7475 - 7489
[2] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
Wang, Yong
Kang, Hongbo
Wu, Doudou
Yang, Wenming
Zhang, Longbin
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049
[3] Fast CNN-Based Single-Person 2D Human Pose Estimation for Autonomous Systems
Papaioannidis, Christos
Mademlis, Ioannis
Pitas, Ioannis
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1262 - 1275
[4] GLPose: Global-Local Representation Learning for Human Pose Estimation
Jiao, Yingying
Chen, Haipeng
Feng, Runyang
Chen, Haoming
Wu, Sifan
Yin, Yifang
Liu, Zhenguang
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
[5] Pose Estimation of Robot End-Effector using a CNN-Based Cascade Estimator
Ortega, Kevin D.
Sepulveda, Jorge I.
Hernandez, Byron
Holguin, German A.
Medeiros, Henry
2023 IEEE 6TH COLOMBIAN CONFERENCE ON AUTOMATIC CONTROL, CCAC, 2023, : 85 - 90
[6] A local-global coupled-layer puppet model for robust online human pose tracking
Ma, Miao
Marturi, Naresh
Li, Yibin
Stolkin, Rustam
Leonardis, Ales
COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 153 : 163 - 178
[7] Correlation ICP algorithm for pose estimation based on local and global features
Chavarria, Marco A.
Sommer, Gerald
VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 528 - 534
[8] Regression-Based Camera Pose Estimation through Multi-Level Local Features and Global Features
Xu, Meng
Zhang, Zhihuang
Gong, Yuanhao
Poslad, Stefan
SENSORS, 2023, 23 (08)
[9] MDST: 2-D Human Pose Estimation for SISO UWB Radar Based on Micro-Doppler Signature via Cascade and Parallel Swin Transformer
Zhou, Xiaolong
Jin, Tian
Dai, Yongpeng
Song, Yongping
Li, Kemeng
Song, Shaoqiu
IEEE SENSORS JOURNAL, 2024, 24 (13) : 21730 - 21749
[10] InfPose: Real-Time Infrared Multi-Human Pose Estimation for Edge Devices Based on Encoder-Decoder CNN Architecture
Xu, Xin
Wei, Xinchao
Xu, Yuelei
Zhang, Zhaoxiang
Gong, Kun
Li, Huafeng
Xiao, Leibing
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3672 - 3679

← 1 →