CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation

被引:3
|
作者
Zhao, Jianyu [1 ]
Sanderson, Edward [1 ]
Matuszewski, Bogdan J. J. [1 ]
机构
[1] Univ Cent Lancashire, Comp Vis & Machine Learning CVML Grp, Preston PR1 2HE, England
基金
英国工程与自然科学研究理事会;
关键词
3D pose estimation; deep learning; variational autoencoder; synthetic data; 6D POSE;
D O I
10.1109/ACCESS.2023.3243551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.
引用
收藏
页码:13830 / 13845
页数:16
相关论文
共 50 条
  • [41] 3D hand pose estimation from a single RGB image through semantic decomposition of VAE latent space
    Xinru Guo
    Song Xu
    Xiangbo Lin
    Yi Sun
    Xiaohong Ma
    Pattern Analysis and Applications, 2022, 25 : 157 - 167
  • [42] 3D hand pose estimation from a single RGB image through semantic decomposition of VAE latent space
    Guo, Xinru
    Xu, Song
    Lin, Xiangbo
    Sun, Yi
    Ma, Xiaohong
    PATTERN ANALYSIS AND APPLICATIONS, 2022, 25 (01) : 157 - 167
  • [43] A 3D Camera Protocol for Object Pose Estimation from Point Cloud in Robot Operations
    Charngtong, Chiwin
    Dheeravongkit, Arbtip
    Vonzbunvona, Sunachai
    2024 21ST INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING, JCSSE 2024, 2024, : 9 - 15
  • [44] TAPoseNet: Teeth Alignment Based on Pose Estimation via Multi-scale Graph Convolutional Network
    Deng, Qingxin
    Yang, Xunyu
    Huang, Minghan
    Jiang, Landu
    Zhang, Dian
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 314 - 323
  • [45] 3D Fetal Pose Estimation with Adaptive Variance and Conditional Generative Adversarial Network
    Xu, Junshen
    Zhang, Molin
    Turk, Esra Abaci
    Grant, P. Ellen
    Golland, Polina
    Adalsteinsson, Elfar
    MEDICAL ULTRASOUND, AND PRETERM, PERINATAL AND PAEDIATRIC IMAGE ANALYSIS, ASMUS 2020, PIPPI 2020, 2020, 12437 : 201 - 210
  • [46] Learning a deep network with spherical part model for 3D hand pose estimation
    Chen, Tzu-Yang
    Ting, Pai-Wen
    Wu, Min-Yu
    Fu, Li-Chen
    PATTERN RECOGNITION, 2018, 80 : 1 - 20
  • [47] Fetal Pose Estimation in Volumetric MRI Using a 3D Convolution Neural Network
    Xu, Junshen
    Zhang, Molin
    Turk, Esra Abaci
    Zhang, Larry
    Grant, P. Ellen
    Ying, Kui
    Golland, Polina
    Adalsteinsson, Elfar
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 403 - 410
  • [48] 3D Single Person Pose Estimation Method Based on Deep Learning
    Yuan, Xinrui
    Wang, Hairong
    Wang, Jun
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 481 - 491
  • [49] Coarse-to-Fine 3D Human Pose Estimation
    Guo, Yu
    Zhao, Lin
    Zhang, Shanshan
    Yang, Jian
    IMAGE AND GRAPHICS, ICIG 2019, PT III, 2019, 11903 : 579 - 592
  • [50] GoPose: 3D Human Pose Estimation Using WiFi
    Ren, Yili
    Wang, Zi
    Wang, Yichao
    Tan, Sheng
    Chen, Yingying
    Yang, Jie
    PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2022, 6 (02):