Conditional Directed Graph Convolution for 3D Human Pose Estimation

被引：62

作者：

Hu, Wenbo ^{[1
,2
]}

Zhang, Changgong ^{[2
]}

Zhan, Fangneng ^{[3
]}

Zhang, Lei ^{[2
,4
]}

Wong, Tien-Tsin ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

关键词：

3D human pose; conditional directed graph convolution; NETWORK;

D O I：

10.1145/3474085.3475219

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graph convolutional networks have significantly improved 3D human pose estimation by representing the human skeleton as an undirected graph. However, this representation fails to reflect the articulated characteristic of human skeletons as the hierarchical orders among the joints are not explicitly presented. In this paper, we propose to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints. By so doing, the directions of edges can explicitly reflect the hierarchical relationships among the nodes. Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses. Altogether, we form a U-shaped network, named U-shaped Conditional Directed Graph Convolutional Network, for 3D human pose estimation from monocular videos. To evaluate the effectiveness of our method, we conducted extensive experiments on two challenging large-scale benchmarks: Human3.6M and MPI-INF-3DHP. Both quantitative and qualitative results show that our method achieves top performance. Also, ablation studies show that directed graphs can better exploit the hierarchy of articulated human skeletons than undirected graphs, and the conditional connections can yield adaptive graph topologies for different poses.

引用

页码：602 / 611

页数：10

共 58 条

[51] Deep Kinematics Analysis for Monocular 3D Human Pose Estimation [J].

Xu, Jingwei ;

Yu, Zhenbo ;

Ni, Bingbing ;

Yang, Jiancheng ;

Yang, Xiaokang ;

Zhang, Wenjun .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :896-905

[52] Convolutional Sequence Generation for Skeleton-Based Action Synthesis [J].

Yan, Sijie ;

Li, Zhizhong ;

Xiong, Yuanjun ;

Yan, Huahan ;

Lin, Dahua .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4393-4401

[53]

Yan SJ, 2018, AAAI CONF ARTIF INTE, P7444

[54]

Yang B, 2019, ADV NEUR IN, V32

[55]

Zeng Ailing, 2020, EUROPEAN C COMPUTER

[56]

Zhang CG, 2021, Arxiv, DOI arXiv:2104.03520

[57] Semantic Graph Convolutional Networks for 3D Human Pose Regression [J].

Zhao, Long ;

Peng, Xi ;

Tian, Yu ;

Kapadia, Mubbasir ;

Metaxas, Dimitris N. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3420-3430

[58]

Zou Zhiming, 2020, BRIT MACHINE VISION

← 1 2 3 4 5 6 →