2D Human Skeleton Action Recognition Based on Depth Estimation

被引:1
作者
Wang, Lei [1 ,2 ]
Yang, Shanmin [3 ]
Zhang, Jianwei [1 ]
Gu, Song [2 ]
机构
[1] Sichuan Univ, Sichuan, Peoples R China
[2] Chengdu Aeronaut Polytech, Chengdu, Peoples R China
[3] Chengdu Univ Informat Technol, Chengdu, Peoples R China
关键词
action recognition; depth estimation; muti-tasks learning; graph structure; video surveillance; NETWORK;
D O I
10.1587/transinf.2023EDP7223
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition (HAR) exhibits limited accuracy in video surveillance due to the 2D information captured with monocular cameras. To address the problem, a depth estimation-based human skeleton action recognition method (SARDE) is proposed in this study, with the aim of transforming 2D human action data into 3D format to dig hidden action clues in the 2D data. SARDE comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation. The two tasks are integrated in a multi-task manner in end-to-end training to comprehensively utilize the correlation between action recognition and depth estimation by sharing parameters to learn the depth features effectively for human action recognition. In this study, graph-structured networks with inception blocks and skip connections are investigated for depth estimation. The experimental results verify the effectiveness and superiority of the proposed method in skeleton action recognition that the method reaches state-of-the-art on the datasets.
引用
收藏
页码:869 / 877
页数:9
相关论文
共 43 条
[1]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Depth-based end-to-end deep network for human action recognition [J].
Chaudhary, Sachin ;
Murala, Subrahmanyam .
IET COMPUTER VISION, 2019, 13 (01) :15-22
[4]  
Cho S, 2020, IEEE WINT CONF APPL, P624, DOI [10.1109/wacv45572.2020.9093639, 10.1109/WACV45572.2020.9093639]
[5]  
Eigen D, 2014, ADV NEUR IN, V27
[6]   Pictorial structures for object recognition [J].
Felzenszwalb, PF ;
Huttenlocher, DP .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 61 (01) :55-79
[7]   Unsupervised Monocular Depth Estimation with Left-Right Consistency [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Brostow, Gabriel J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6602-6611
[8]   Gaze Estimation via a Differential Eyes' Appearances Network with a Reference Grid [J].
Gu, Song ;
Wang, Lihui ;
He, Long ;
He, Xianding ;
Wang, Jian .
ENGINEERING, 2021, 7 (06) :777-786
[9]   Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition [J].
Hu, Kai ;
Jin, Junlan ;
Shen, Chaowen ;
Xia, Min ;
Weng, Liguo .
MULTIMEDIA SYSTEMS, 2023, 29 (04) :1941-1954
[10]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269