GMDN: A lightweight graph-based mixture density network for 3D human pose regression

被引：12

作者：

Zou, Lu ^{[1
]}

Huang, Zhangjin ^{[1
]}

Gu, Naijie ^{[1
]}

Wang, Fangjun ^{[1
]}

Yang, Zhouwang ^{[1
]}

Wang, Guoping ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei 230026, Peoples R China

[2] Peking Univ, Beijing 100000, Peoples R China

来源：

COMPUTERS & GRAPHICS-UK | 2021年 / 95卷

基金：

中国国家自然科学基金;

关键词：

3D human pose estimation; Graph convolutional network; Mixture density network;

D O I：

10.1016/j.cag.2021.01.010

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

3D human pose estimation from 2D detections is an ill-posed problem because multiple solutions may exist due to the inherent ambiguity and occlusion. In this paper, we propose a novel graph-based mixture density network (GMDN) to tackle the 2D-to-3D human pose estimation problem. We formulate the 2D joint locations of the human body as a graph, and thus the pose estimation task can be redefined as a graph regression problem. Additionally, we present a novel graph convolutional operation with the incorporation of structural knowledge about human body configurations to assist with reasoning of the structural relations implied in the human bodies. Furthermore, we employ mixture density networks to formulate the 3D human poses as a multimodal distribution. The presented GMDN is lightweight with only 0.30M parameters, and the experimental results demonstrate that it achieves state-of-the-art performance. ? 2021 Elsevier Ltd. All rights reserved. 3D human pose estimation from 2D detections is an ill-posed problem because multiple solutions may exist due to the inherent ambiguity and occlusion. In this paper, we propose a novel graph-based mixture density network (GMDN) to tackle the 2D-to-3D human pose estimation problem. We formulate the 2D joint locations of the human body as a graph, and thus the pose estimation task can be redefined as a graph regression problem. Additionally, we present a novel graph convolutional operation with the incorporation of structural knowledge about human body configurations to assist with reasoning of the structural relations implied in the human bodies. Furthermore, we employ mixture density networks to formulate the 3D human poses as a multimodal distribution. The presented GMDN is lightweight with only 0.30M parameters, and the experimental results demonstrate that it achieves state-of-the-art performance.

引用

页码：115 / 122

页数：8

共 30 条

[1] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].

Andriluka, Mykhaylo ;

Pishchulin, Leonid ;

Gehler, Peter ;

Schiele, Bernt .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693

[2] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].

Cai, Yujun ;

Ge, Liuhao ;

Liu, Jun ;

Cai, Jianfei ;

Cham, Tat-Jen ;

Yuan, Junsong ;

Thalmann, Nadia Magnenat .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281

[3] Cascaded Pyramid Network for Multi-Person Pose Estimation [J].

Chen, Yilun ;

Wang, Zhicheng ;

Peng, Yuxiang ;

Zhang, Zhiqiang ;

Yu, Gang ;

Sun, Jian .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112

[4] Optimizing Network Structure for 3D Human Pose Estimation [J].

Ci, Hai ;

Wang, Chunyu ;

Ma, Xiaoxuan ;

Wang, Yizhou .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2262-2271

[5]

Defferrard M, 2016, ADV NEUR IN, V29

[6]

Fang HS, 2018, AAAI CONF ARTIF INTE, P6821

[7]

Glorot X., 2010, JMLR WORKSHOP C P, P249

[8] Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments [J].

Ionescu, Catalin ;

Papava, Dragos ;

Olaru, Vlad ;

Sminchisescu, Cristian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) :1325-1339

[9] Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections [J].

Jahangiri, Ehsan ;

Yuille, Alan L. .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :805-814

[10]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

← 1 2 3 →