Learning to Augment Poses for 3D Human Pose Estimation in Images and Videos

被引:8
作者
Zhang, Jianfeng [1 ]
Gong, Kehong [1 ]
Wang, Xinchao [1 ]
Feng, Jiashi [2 ]
机构
[1] Natl Univ Singapore, Singapore 119077, Singapore
[2] ByteDance, Beijing 100086, Peoples R China
基金
新加坡国家研究基金会;
关键词
Three-dimensional displays; Training; Data models; Pose estimation; Training data; Solid modeling; Videos; Artificial intelligence; computational and artificial intelligence; deep learning; learning systems; machine learning; supervised learning;
D O I
10.1109/TPAMI.2023.3243400
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing 3D human pose estimation methods often suffer inferior generalization performance to new datasets, largely due to the limited diversity of 2D-3D pose pairs in the training data. To address this problem, we present PoseAug, a novel auto-augmentation framework that learns to augment the available training poses towards greater diversity and thus enhances the generalization power of the trained 2D-to-3D pose estimator. Specifically, PoseAug introduces a novel pose augmentor that learns to adjust various geometry factors of a pose through differentiable operations. With such differentiable capacity, the augmentor can be jointly optimized with the 3D pose estimator and take the estimation error as feedback to generate more diverse and harder poses in an online manner. PoseAug is generic and handy to be applied to various 3D pose estimation models. It is also extendable to aid pose estimation from video frames. To demonstrate this, we introduce PoseAug-V, a simple yet effective method that decomposes video pose augmentation into end pose augmentation and conditioned intermediate pose generation. Extensive experiments demonstrate that PoseAug and its extension PoseAug-V bring clear improvements for frame-based and video-based 3D pose estimation on several out-of-domain 3D human pose benchmarks.
引用
收藏
页码:10012 / 10026
页数:15
相关论文
共 61 条
  • [41] Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation
    Rhodin, Helge
    Salzmann, Mathieu
    Fua, Pascal
    [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 765 - 782
  • [42] Neural Scene Decomposition for Multi-Person Motion Capture
    Rhodin, Helge
    Constantin, Victor
    Katircioglu, Isinsu
    Salzmann, Mathieu
    Fua, Pascal
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7695 - 7705
  • [43] Rogez G., 2016, PROC INT C NEURAL IN
  • [44] Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking
    Sharma, Saurabh
    Varigonda, Pavan Teja
    Bindal, Prashast
    Sharma, Abhishek
    Jain, Arjun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2325 - 2334
  • [45] An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition
    Si, Chenyang
    Chen, Wentao
    Wang, Wei
    Wang, Liang
    Tan, Tieniu
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1227 - 1236
  • [46] Deep High-Resolution Representation Learning for Human Pose Estimation
    Sun, Ke
    Xiao, Bin
    Liu, Dong
    Wang, Jingdong
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5686 - 5696
  • [47] Direct Prediction of 3D Body Poses from Motion Compensated Sequences
    Tekin, Bugra
    Rozantsev, Artem
    Lepetit, Vincent
    Fua, Pascal
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : CP8 - CP8
  • [48] Learning from Synthetic Humans
    Varol, Gul
    Romero, Javier
    Martin, Xavier
    Mahmood, Naureen
    Black, Michael J.
    Laptev, Ivan
    Schmid, Cordelia
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4627 - 4635
  • [49] Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera
    von Marcard, Timo
    Henschel, Roberto
    Black, Michael J.
    Rosenhahn, Bodo
    Pons-Moll, Gerard
    [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 614 - 631
  • [50] RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation
    Wandt, Bastian
    Rosenhahn, Bodo
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7774 - 7783