SS-Pose: Self-Supervised 6-D Object Pose Representation Learning Without Rendering

被引:3
作者
Mu, Fengjun [1 ]
Huang, Rui [1 ]
Zhang, Jingting [1 ]
Zou, Chaobin [1 ]
Shi, Kecheng [1 ]
Sun, Shixiang [1 ]
Zhan, Huayi [2 ]
Zhao, Pengbo [3 ]
Qiu, Jing [1 ]
Cheng, Hong [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu 611731, Peoples R China
[2] Sichuan Changhong Elect Grp Co Ltd, Mianyang 621000, Sichuan, Peoples R China
[3] Huawei Shanghai Res Ctr, Shanghai 201206, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Industrial perception; object pose estimation; representation learning; self-supervised learning;
D O I
10.1109/TII.2024.3424591
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Object pose estimation has extensive applications in various industrial scenarios. However, the heavy reliance on dense 6-D annotation and textured object models has become a significant obstacle to the widespread industrial application of 6-D object pose estimation methods. In this work, we present SS-Pose, a self-supervised learning framework for estimating 6-D object poses without annotated 6-D data and textured model. SS-Pose proposes the coordinate system datum reinitializer stage to dynamically establish a sequence-level pose representation datum, and the temporal-spatial constraint resolver module to obtain the self-supervised learning target through interframe constraints. We introduce a one-shot cross-coordinate transformation that establishes the relationship between the 6-D representation and the object poses, which can be further utilized in real-world tasks. We evaluated the proposed SS-Pose on the challenging YCB-Video dataset and texture-less T-LESS dataset. Our approach achieves competitive performance with significantly lower data dependency, making it suitable for visual perception in industrial applications.
引用
收藏
页码:13665 / 13675
页数:11
相关论文
共 30 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]   OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation [J].
Cai, Dingding ;
Heikkia, Janne ;
Rahtu, Esa .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :6793-6803
[3]  
Deng XK, 2020, IEEE INT CONF ROBOT, P3665, DOI [10.1109/icra40945.2020.9196714, 10.1109/ICRA40945.2020.9196714]
[4]  
Denninger M., 2020, P RSS VIRT, P1
[5]   SyMFM6D: Symmetry-Aware Multi-Directional Fusion for Multi-View 6D Object Pose Estimation [J].
Duffhauss, Fabian ;
Koch, Sebastian ;
Ziesche, Hanna ;
Vien, Ngo Anh ;
Neumann, Gerhard .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (09) :5315-5322
[6]  
Fu Y, 2022, ADV NEUR IN
[7]  
Hinterstoisser S, 2011, IEEE I CONF COMP VIS, P858, DOI 10.1109/ICCV.2011.6126326
[8]   BOP: Benchmark for 6D Object Pose Estimation [J].
Hodan, Tomas ;
Michel, Frank ;
Brachmann, Eric ;
Kehl, Wadim ;
Buch, Anders Glent ;
Kraft, Dirk ;
Drost, Bertram ;
Vidal, Joel ;
Ihrke, Stephan ;
Zabulis, Xenophon ;
Sahin, Caner ;
Manhardt, Fabian ;
Tombari, Federico ;
Kim, Tae-Kyun ;
Matas, Jiri ;
Rother, Carsten .
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :19-35
[9]   T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects [J].
Hodan, Tomas ;
Haluza, Pavel ;
Obdrzalek, Stepan ;
Matas, Jiri ;
Lourakis, Manolis ;
Zabulis, Xenophon .
2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, :880-888
[10]   Latent Representation Self-Supervised Pose Network for Accurate Monocular Pipe Pose Estimation [J].
Hu, Jia ;
Liu, Shaoli ;
Liu, Jianhua ;
Wang, Zhenjie ;
Zhang, Wenxiong .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (05) :7180-7189