CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation

被引：3

作者：

Zhao, Jianyu ^{[1
]}

Sanderson, Edward ^{[1
]}

Matuszewski, Bogdan J. J. ^{[1
]}

机构：

[1] Univ Cent Lancashire, Comp Vis & Machine Learning CVML Grp, Preston PR1 2HE, England

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

英国工程与自然科学研究理事会;

关键词：

3D pose estimation; deep learning; variational autoencoder; synthetic data; 6D POSE;

D O I：

10.1109/ACCESS.2023.3243551

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.

引用

页码：13830 / 13845

页数：16

共 50 条

[1] Deep 3D Pose Dictionary: 3D Human Pose Estimation from Single RGB Image Using Deep Convolutional Neural Network
Elbasiony, Reda
Gomaa, Walid
Ogata, Tetsuya
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 310 - 320
[2] Deep Manifold Embedding for 3D Object Pose Estimation
Ninomiya, Hiroshi
Kawanishi, Yasutomo
Deguchi, Daisuke
Ide, Ichiro
Murase, Hiroshi
Kobori, Norimasa
Nakano, Yusuke
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 5, 2017, : 173 - 178
[3] Object Pose Estimation Method Based on 3D Key Points Voting
Wang T.
Yu E.
Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2024, 57 (03): : 291 - 300
[4] MULTI-LEVEL NETWORK FOR HIGH-SPEED MULTI-PERSON POSE ESTIMATION
Huang, Ying
Zhuang, Jiankai
Qin, Zengchang
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 589 - 593
[5] Domain-Translated 3D Object Pose Estimation
Papaioannidis, Christos
Mygdalis, Vasileios
Pitas, Ioannis
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9279 - 9291
[6] RF-based Multi-view Pose Machine for Multi-Person 3D Pose Estimation
Xie, Chunyang
Zhang, Dongheng
Wu, Zhi
Yu, Cong
Hu, Yang
Sun, Qibin
Chen, Yan
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2669 - 2674
[7] RPM 2.0: RF-Based Pose Machines for Multi-Person 3D Pose Estimation
Xie, Chunyang
Zhang, Dongheng
Wu, Zhi
Yu, Cong
Hu, Yang
Chen, Yan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 490 - 503
[8] Object Pose Estimation via Viewpoint Matching of 3D Models
Lee, Junha
Ji, Sanghoon
You, Sujeong
2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 1546 - 1548
[9] Position constrained network for 3D human pose estimation
Dong, Xiena
Yu, Jun
Zhang, Jian
MULTIMEDIA SYSTEMS, 2023, 29 (02) : 459 - 468
[10] Position constrained network for 3D human pose estimation
Xiena Dong
Jun Yu
Jian Zhang
Multimedia Systems, 2023, 29 : 459 - 468

← 1 2 3 4 5 →