CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation

被引:3
|
作者
Zhao, Jianyu [1 ]
Sanderson, Edward [1 ]
Matuszewski, Bogdan J. J. [1 ]
机构
[1] Univ Cent Lancashire, Comp Vis & Machine Learning CVML Grp, Preston PR1 2HE, England
基金
英国工程与自然科学研究理事会;
关键词
3D pose estimation; deep learning; variational autoencoder; synthetic data; 6D POSE;
D O I
10.1109/ACCESS.2023.3243551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.
引用
收藏
页码:13830 / 13845
页数:16
相关论文
共 50 条
  • [31] 3D human pose estimation with multi-hypotheses gated transformer
    Dong, Xiena
    Zhang, Jian
    Yu, Jun
    Yu, Ting
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [32] Multi-person 3D pose estimation from unlabelled data
    Rodriguez-Criado, Daniel
    Bachiller-Burgos, Pilar
    Vogiatzis, George
    Manso, Luis J.
    MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
  • [33] Deep 3D human pose estimation: A review
    Wang, Jinbao
    Tan, Shujie
    Zhen, Xiantong
    Xu, Shuo
    Zheng, Feng
    He, Zhenyu
    Shao, Ling
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 210
  • [34] Video-Based 3D pose estimation for residential roofing
    Wang, Ruochen
    Zheng, Liying
    Hawke, Ashley L.
    Carey, Robert E.
    Breloff, Scott P.
    Li, Kang
    Peng, Xi
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (03) : 369 - 377
  • [35] 3D Human Pose Estimation from RGB plus D Images with Convolutional Neural Networks
    Cai, Yiheng
    Wang, Xueyan
    Kong, Xinran
    2018 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND BIOINFORMATICS (ICBEB 2018), 2018, : 64 - 69
  • [36] 3D human pose estimation by depth map
    Wu, Jianzhai
    Hu, Dewen
    Xiang, Fengtao
    Yuan, Xingsheng
    Su, Jiongming
    VISUAL COMPUTER, 2020, 36 (07) : 1401 - 1410
  • [37] 3D human pose estimation by depth map
    Jianzhai Wu
    Dewen Hu
    Fengtao Xiang
    Xingsheng Yuan
    Jiongming Su
    The Visual Computer, 2020, 36 : 1401 - 1410
  • [38] Refining Weights for Enhanced Object Similarity in Multi-perspective 6Dof Pose Estimation and 3D Object Detection
    Kusumo, Budiarianto Suryo
    Thomas, Ulrike
    DEEP LEARNING THEORY AND APPLICATIONS, PT I, DELTA 2024, 2024, 2171 : 310 - 327
  • [39] Multi-stage 3D Pose Estimation Method of Robot Arm Based on RGB Image
    Lei, Haibo
    Zhou, Fan
    Zhuang, Chungang
    2021 7TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2021, : 84 - 88
  • [40] Multi Task-Guided 6D Object Pose Estimation
    Thu-Uyen Nguyen
    Van-Duc Vu
    Van-Thiep Nguyen
    Ngoc-Anh Hoang
    Duy-Quang Vu
    Duc-Thanh Tran
    Khanh-Toan Phan
    Anh-Truong Mai
    Van-Hiep Duong
    Cong-Trinh Chan
    Ngoc-Trung Ho
    Quang-Tri Duong
    Phuc-Quan Ngo
    Dinh-Cuong Hoang
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2024, 2024, : 215 - 222