Conditional Directed Graph Convolution for 3D Human Pose Estimation

被引:50
作者
Hu, Wenbo [1 ,2 ]
Zhang, Changgong [2 ]
Zhan, Fangneng [3 ]
Zhang, Lei [2 ,4 ]
Wong, Tien-Tsin [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
关键词
3D human pose; conditional directed graph convolution; NETWORK;
D O I
10.1145/3474085.3475219
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph convolutional networks have significantly improved 3D human pose estimation by representing the human skeleton as an undirected graph. However, this representation fails to reflect the articulated characteristic of human skeletons as the hierarchical orders among the joints are not explicitly presented. In this paper, we propose to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints. By so doing, the directions of edges can explicitly reflect the hierarchical relationships among the nodes. Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses. Altogether, we form a U-shaped network, named U-shaped Conditional Directed Graph Convolutional Network, for 3D human pose estimation from monocular videos. To evaluate the effectiveness of our method, we conducted extensive experiments on two challenging large-scale benchmarks: Human3.6M and MPI-INF-3DHP. Both quantitative and qualitative results show that our method achieves top performance. Also, ablation studies show that directed graphs can better exploit the hierarchy of articulated human skeletons than undirected graphs, and the conditional connections can yield adaptive graph topologies for different poses.
引用
收藏
页码:602 / 611
页数:10
相关论文
共 58 条
[1]  
Atwood James, 2016, C NEURAL INFORM PROC
[2]   Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].
Cai, Yujun ;
Ge, Liuhao ;
Liu, Jun ;
Cai, Jianfei ;
Cham, Tat-Jen ;
Yuan, Junsong ;
Thalmann, Nadia Magnenat .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281
[3]   OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].
Cao, Zhe ;
Hidalgo, Gines ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186
[4]   3D Human Pose Estimation=2D Pose Estimation plus Matching [J].
Chen, Ching-Hang ;
Ramanan, Deva .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5759-5767
[5]   Cascaded Pyramid Network for Multi-Person Pose Estimation [J].
Chen, Yilun ;
Wang, Zhicheng ;
Peng, Yuxiang ;
Zhang, Zhiqiang ;
Yu, Gang ;
Sun, Jian .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112
[6]   Occlusion-Aware Networks for 3D Human Pose Estimation in Video [J].
Cheng, Yu ;
Yang, Bo ;
Wang, Bo ;
Yan, Wending ;
Tan, Robby T. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :723-732
[7]  
Cheng Yu, 2020, AAAI C ARTIFICIAL IN
[8]   Optimizing Network Structure for 3D Human Pose Estimation [J].
Ci, Hai ;
Wang, Chunyu ;
Ma, Xiaoxuan ;
Wang, Yizhou .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2262-2271
[9]  
Defferrard M, 2016, ADV NEUR IN, V29
[10]  
Ding JB, 2019, Arxiv, DOI arXiv:1910.12249