Dual Networks Based 3D Multi-Person Pose Estimation From Monocular Video

被引:14
作者
Cheng, Yu [1 ]
Wang, Bo [2 ]
Tan, Robby T. T. [1 ]
机构
[1] Natl Univ Singapore, Yale NUS Coll, Dept Elect & Comp Engn, Singapore 119077, Singapore
[2] CtrsVision, Brea, CA 92821 USA
基金
新加坡国家研究基金会;
关键词
Three-dimensional displays; Pose estimation; Optimization; Training; Testing; Semisupervised learning; Heating systems; 3D multi-person pose estimation; semi-supervised learning; test time optimization;
D O I
10.1109/TPAMI.2022.3170353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. We also introduce a two-person pose discriminator that enforces natural two-person interactions. Finally, we apply a semi-supervised method to overcome the 3D ground-truth data scarcity. Our evaluations demonstrate the effectiveness of the proposed method and its individual components. Our code and pretrained models are available publicly: https://github.com/3dpose/3D-Multi-Person-Pose.
引用
收藏
页码:1636 / 1651
页数:16
相关论文
共 95 条
  • [1] [Anonymous], 2017, P IEEE C COMP VIS PA
  • [2] Exploiting temporal context for 3D human pose estimation in the wild
    Arnab, Anurag
    Doersch, Carl
    Zisserman, Andrew
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3390 - 3399
  • [3] PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation
    Benzine, Abdallah
    Chabot, Florian
    Luvison, Bertrand
    Quoc Cuong Pham
    Achard, Catherine
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6855 - 6864
  • [4] MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
    Bertoni, Lorenzo
    Kreiss, Sven
    Alahi, Alexandre
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6860 - 6870
  • [5] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
    Cai, Yujun
    Ge, Liuhao
    Liu, Jun
    Cai, Jianfei
    Cham, Tat-Jen
    Yuan, Junsong
    Thalmann, Nadia Magnenat
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2272 - 2281
  • [6] Can Wang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12348), P242, DOI 10.1007/978-3-030-58580-8_15
  • [7] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
    Cao, Zhe
    Hidalgo, Gines
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) : 172 - 186
  • [8] Unsupervised 3D Pose Estimation with Geometric Self-Supervision
    Chen, Ching-Hang
    Tyagi, Ambrish
    Agrawal, Amit
    Drover, Dylan
    Rohith, M., V
    Stojanov, Stefan
    Rehg, James M.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5707 - 5717
  • [9] Cascaded Pyramid Network for Multi-Person Pose Estimation
    Chen, Yilun
    Wang, Zhicheng
    Peng, Yuxiang
    Zhang, Zhiqiang
    Yu, Gang
    Sun, Jian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7103 - 7112
  • [10] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
    Cheng, Bowen
    Xiao, Bin
    Wang, Jingdong
    Shi, Honghui
    Huang, Thomas S.
    Zhang, Lei
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5385 - 5394