Geometric Consistency-Guaranteed Spatio-Temporal Transformer for Unsupervised Multiview 3-D Pose Estimation

被引:1
|
作者
Dong, Kaiwen [1 ]
Riou, Kevin [2 ]
Zhu, Jingwen [2 ]
Pastor, Andreas [2 ]
Subrin, Kevin [2 ]
Zhou, Yu [3 ]
Yun, Xiao [3 ]
Sun, Yanjing [3 ]
Le Callet, Patrick [2 ,4 ]
机构
[1] China Univ Min & Technol, IOT Percept Mine Res Ctr, Xuzhou 221008, Peoples R China
[2] Nantes Univ, Ecole Cent Nantes, CNRS, LS2N,UMR 6004, F-44300 Nantes, France
[3] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Peoples R China
[4] Inst Univ France IUF, F-75005 Paris, France
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Pose estimation; Task analysis; Pipelines; Cameras; Transformers; Training; Multiview; pose estimation; transformer;
D O I
10.1109/TIM.2024.3440376
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Unsupervised 3-D pose estimation has gained prominence due to the challenges in acquiring labeled 3-D data for training. Despite promising progress, unsupervised approaches still lag behind supervised methods in performance. Two factors impede the progress of unsupervised approaches: incomplete geometric constraint and inadequate interaction among spatial, temporal, and multiview features. This article introduces an unsupervised pipeline that uses calibrated camera parameters as geometric constraints across views and coordinate spaces to optimize the model by minimizing inconsistencies between the 2-D input pose and the reprojection of the predicted 3-D pose. This pipeline utilizes the novel hierarchical cross transformer (HCT) to encode higher levels of information by enabling interactions among hierarchical features containing different levels of temporal, spatial, and cross-view information. By minimizing the reliance on human-specific parts, the HCT shows potential for adapting to various pose estimation tasks. To validate the adaptability, we build a connection between human pose estimation and scene pose estimation, introducing a dynamic-keypoints-3-D (DKs-3D) dataset tailored for 3-D scene pose estimation in robotic manipulation. Experiments on two 3-D human pose estimation datasets demonstrate our method's new state-of-the-art performance among weakly and unsupervised approaches. The adaptability of our method is confirmed through experiments on DK-3D, setting the initial benchmark for unsupervised 2-D-to-3-D scene pose lifting.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] STAFFormer: Spatio-temporal adaptive fusion transformer for efficient 3D human pose estimation
    Hao, Feng
    Zhong, Fujin
    Yu, Hong
    Hu, Jun
    Yang, Yan
    IMAGE AND VISION COMPUTING, 2024, 149
  • [2] Spatial-Temporal-Geometric Graph Convolutional Network for 3-D Human Pose Estimation From Multiview Video
    Dong, Kaiwen
    Zhou, Yu
    Riou, Kevin
    Yun, Xiao
    Sun, Yanjing
    Subrin, Kevin
    Le Callet, Patrick
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [3] CMRPPGFormer: 3-D Spatio-Temporal Convolutional Modulation Transformer Network for Remote Heart Rate Estimation
    Ma, Xiaolin
    Wang, Zhaosen
    Liu, Xinhua
    Kuang, Hailan
    IEEE SENSORS JOURNAL, 2024, 24 (19) : 30275 - 30286
  • [4] Spatio-temporal 3D pose estimation of objects in stereo images
    Barrois, Bjoern
    Woehler, Christian
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 507 - 516
  • [5] SPATIO-TEMPORAL ATTENTION GRAPH FOR MONOCULAR 3D HUMAN POSE ESTIMATION
    Zhang, Lijun
    Shao, Xiaohu
    Li, Zhenghao
    Zhou, Xiang-Dong
    Shi, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1231 - 1235
  • [6] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
    Wang, Yong
    Kang, Hongbo
    Wu, Doudou
    Yang, Wenming
    Zhang, Longbin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049
  • [7] Multiview Video-Based 3-D Hand Pose Estimation
    Khaleghi L.
    Sepas-Moghaddam A.
    Marshall J.
    Etemad A.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (04): : 896 - 909
  • [8] 3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention
    Tang, Zhenhua
    Qiu, Zhaofan
    Hao, Yanbin
    Hong, Richang
    Yao, Ting
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4790 - 4799
  • [9] A fused convolutional spatio-temporal progressive approach for 3D human pose estimation
    Zhang, Hehao
    Hu, Zhengping
    Sun, Zhe
    Zhao, Mengyao
    Bi, Shuai
    Di, Jirui
    VISUAL COMPUTER, 2024, 40 (06): : 4387 - 4399
  • [10] Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video
    Xu, Feiyi
    Wang, Jifan
    Sun, Ying
    Qi, Jin
    Dong, Zhenjiang
    Sun, Yanfei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251