Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error

被引：1

作者：

de Franca Silva, Diogenes Wallis ^{[1
]}

Do Monte Lima, Joao Paulo Silva ^{[1
,2
]}

Macedo, David ^{[3
]}

Zanchettin, Cleber ^{[3
]}

Thomas, Diego Gabriel Francis ^{[4
]}

Uchiyama, Hideaki ^{[5
]}

Teichrieb, Veronica ^{[1
]}

机构：

[1] Univ Fed Pernambuco, Centro Informat, Voxar Labs, Recife, PE, Brazil

[2] Univ Fed Rural Pernambuco, Dept Computaao, Visual Comp Lab, Recife, PE, Brazil

[3] Univ Fed Pernambuco, Centro Informat, Recife, PE, Brazil

[4] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka, Japan

[5] Nara Inst Sci & Technol, Grad Sch Sci & Technol, Nara, Japan

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III | 2022年 / 13531卷

关键词：

3D human pose estimation; Unsupervised learning; Deep learning; Reprojection error;

D O I：

10.1007/978-3-031-15934-3_40

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work addresses multi-view multi-person 3D pose estimation in synchronized and calibrated camera views. Recent approaches estimate neural network weights in a supervised way; they rely on ground truth annotated datasets to compute the loss function and optimize the weights in the network. However, manually labeling ground truth datasets is labor-intensive, expensive, and prone to errors. Consequently, it is preferable not to rely heavily on labeled datasets. This work proposes an unsupervised approach to estimating 3D human poses requiring only an off-the-shelf 2D pose estimation method and the intrinsic and extrinsic camera parameters. Our approach uses reprojection error as a loss function instead of comparing the predicted 3D pose with the ground truth. First, we estimate the 3D pose of each person using the plane sweep stereo approach, in which the depth of each 2D joint related to each person is estimated in a selected target view. The estimated 3D pose is then projected onto each of the other views using camera parameters. Finally, the 2D reprojection error in the image plane is computed by comparing it with the estimated 2D pose corresponding to the same person. The 2D poses that correspond to the same person are identified using virtual depth planes, where each 3D pose is projected onto the reference view and compared to find the nearest 2D pose. Our proposed method learns to estimate 3D pose in an end-to-end unsupervised manner and does not require any manual parameter tuning, yet we achieved results close to state-of-the-art supervised methods on a public dataset. Our method achieves only 5.8% points below the fully supervised state-ofthe-art method and only 5.1% points below the best geometric approach in the Campus dataset.

引用

页码：482 / 494

页数：13

共 20 条

[1] 3D Pictorial Structures Revisited: Multiple Human Pose Estimation [J].

Belagiannis, Vasileios ;

Amin, Sikandar ;

Andriluka, Mykhaylo ;

Schiele, Bernt ;

Navab, Nassir ;

Ilic, Slobodan .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) :1929-1942

[2] Multiple Human Pose Estimation with Temporally Consistent 3D Pictorial Structures [J].

Belagiannis, Vasileios ;

Wang, Xinchao ;

Schiele, Bernt ;

Fua, Pascal ;

Ilic, Slobodan ;

Navab, Nassir .

COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 :742-754

[3] 3D Pictorial Structures for Multiple Human Pose Estimation [J].

Belagiannis, Vasileios ;

Amin, Sikandar ;

Andriluka, Mykhaylo ;

Schiele, Bernt ;

Navab, Nassir ;

Ilic, Slobodan .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1669-1676

[4] Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-view Geometry [J].

Chen, He ;

Guo, Pengfei ;

Li, Pengfei ;

Lee, Gim Hee ;

Chirikjian, Gregory .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :541-557

[5] Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views [J].

Dong, Junting ;

Jiang, Wen ;

Huang, Qixing ;

Bao, Hujun ;

Zhou, Xiaowei .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7784-7793

[6] Multiple human 3D pose estimation from multiview images [J].

Ershadi-Nasab, Sara ;

Noury, Erfan ;

Kasaei, Shohreh ;

Sanaei, Esmaeil .

MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (12) :15573-15601

[7]

Hartley R., 2003, MULTIPLE VIEW GEOMET, DOI DOI 10.1017/CBO9780511811685

[8] End-to-end Dynamic Matching Network for Multi-view Multi-person 3D Pose Estimation [J].

Huang, Congzhentao ;

Jiang, Shuai ;

Li, Yang ;

Zhang, Ziyue ;

Traish, Jason ;

Deng, Chen ;

Ferguson, Sam ;

Da Xu, Richard Yi .

COMPUTER VISION - ECCV 2020, PT XXVIII, 2020, 12373 :477-493

[9] Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments [J].

Ionescu, Catalin ;

Papava, Dragos ;

Olaru, Vlad ;

Sminchisescu, Cristian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) :1325-1339

[10] Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild [J].

Iqbal, Umar ;

Molchanov, Pavlo ;

Kautz, Jan .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5242-5251

← 1 2 →