SS-Pose: Self-Supervised 6-D Object Pose Representation Learning Without Rendering

被引：3

作者：

Mu, Fengjun ^{[1
]}

Huang, Rui ^{[1
]}

Zhang, Jingting ^{[1
]}

Zou, Chaobin ^{[1
]}

Shi, Kecheng ^{[1
]}

Sun, Shixiang ^{[1
]}

Zhan, Huayi ^{[2
]}

Zhao, Pengbo ^{[3
]}

Qiu, Jing ^{[1
]}

Cheng, Hong ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Chengdu 611731, Peoples R China

[2] Sichuan Changhong Elect Grp Co Ltd, Mianyang 621000, Sichuan, Peoples R China

[3] Huawei Shanghai Res Ctr, Shanghai 201206, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2024年 / 20卷 / 12期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Industrial perception; object pose estimation; representation learning; self-supervised learning;

D O I：

10.1109/TII.2024.3424591

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Object pose estimation has extensive applications in various industrial scenarios. However, the heavy reliance on dense 6-D annotation and textured object models has become a significant obstacle to the widespread industrial application of 6-D object pose estimation methods. In this work, we present SS-Pose, a self-supervised learning framework for estimating 6-D object poses without annotated 6-D data and textured model. SS-Pose proposes the coordinate system datum reinitializer stage to dynamically establish a sequence-level pose representation datum, and the temporal-spatial constraint resolver module to obtain the self-supervised learning target through interframe constraints. We introduce a one-shot cross-coordinate transformation that establishes the relationship between the 6-D representation and the object poses, which can be further utilized in real-world tasks. We evaluated the proposed SS-Pose on the challenging YCB-Video dataset and texture-less T-LESS dataset. Our approach achieves competitive performance with significantly lower data dependency, making it suitable for visual perception in industrial applications.

引用

页码：13665 / 13675

页数：11

共 30 条

[1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[2] OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation [J].

Cai, Dingding ;

Heikkia, Janne ;

Rahtu, Esa .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :6793-6803

[3]

Deng XK, 2020, IEEE INT CONF ROBOT, P3665, DOI [10.1109/icra40945.2020.9196714, 10.1109/ICRA40945.2020.9196714]

[4]

Denninger M., 2020, P RSS VIRT, P1

[5] SyMFM6D: Symmetry-Aware Multi-Directional Fusion for Multi-View 6D Object Pose Estimation [J].

Duffhauss, Fabian ;

Koch, Sebastian ;

Ziesche, Hanna ;

Vien, Ngo Anh ;

Neumann, Gerhard .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (09) :5315-5322

[6]

Fu Y, 2022, ADV NEUR IN

[7]

Hinterstoisser S, 2011, IEEE I CONF COMP VIS, P858, DOI 10.1109/ICCV.2011.6126326

[8] BOP: Benchmark for 6D Object Pose Estimation [J].

Hodan, Tomas ;

Michel, Frank ;

Brachmann, Eric ;

Kehl, Wadim ;

Buch, Anders Glent ;

Kraft, Dirk ;

Drost, Bertram ;

Vidal, Joel ;

Ihrke, Stephan ;

Zabulis, Xenophon ;

Sahin, Caner ;

Manhardt, Fabian ;

Tombari, Federico ;

Kim, Tae-Kyun ;

Matas, Jiri ;

Rother, Carsten .

COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :19-35

[9] T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects [J].

Hodan, Tomas ;

Haluza, Pavel ;

Obdrzalek, Stepan ;

Matas, Jiri ;

Lourakis, Manolis ;

Zabulis, Xenophon .

2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, :880-888

[10] Latent Representation Self-Supervised Pose Network for Accurate Monocular Pipe Pose Estimation [J].

Hu, Jia ;

Liu, Shaoli ;

Liu, Jianhua ;

Wang, Zhenjie ;

Zhang, Wenxiong .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (05) :7180-7189

← 1 2 3 →