6D Object Pose Estimation With Compact Generalized Non-Local Operation

被引:0
作者
Jiang, Changhong [1 ]
Mu, Xiaoqiao [2 ]
Zhang, Bingbing [3 ]
Liang, Chao [4 ]
Xie, Mujun [1 ]
机构
[1] Changchun Univ Technol, Sch Elect & Elect Engn, Changchun 130012, Peoples R China
[2] Changchun Univ Technol, Sch Mech & Elect Engn, Changchun 130012, Peoples R China
[3] Dalian Minzu Univ, Sch Comp Sci & Engn, Dalian 116602, Peoples R China
[4] Changchun Univ Technol, Coll Comp Sci & Engn, Changchun 130012, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Pose estimation; Feature extraction; Three-dimensional displays; Training; Correlation; Predictive models; Computational modeling; Accuracy; Solid modeling; YOLO; Correlations; subtle feature; end-to-end; long-range spatiotemporal; fine-grained details; representational power;
D O I
10.1109/ACCESS.2024.3508772
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bounding box vertices onto a 2D image, facilitating the estimation of the object's 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.
引用
收藏
页码:178080 / 178088
页数:9
相关论文
共 27 条
[1]   Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image [J].
Brachmann, Eric ;
Michel, Frank ;
Krull, Alexander ;
Yang, Michael Ying ;
Gumhold, Stefan ;
Rother, Carsten .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3364-3372
[2]  
Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
[3]  
Burdea P Coiffet G.C., 2003, VIRTUAL REALITY TECH
[4]  
Deng XK, 2020, IEEE INT CONF ROBOT, P3665, DOI [10.1109/ICRA40945.2020.9196714, 10.1109/icra40945.2020.9196714]
[5]  
Hinterstoisser S, 2012, LECT NOTES COMPUT SC, V7585, P593, DOI 10.1007/978-3-642-33885-4_60
[6]  
Hinterstoisser S, 2011, IEEE I CONF COMP VIS, P858, DOI 10.1109/ICCV.2011.6126326
[7]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[8]   SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again [J].
Kehl, Wadim ;
Manhardt, Fabian ;
Tombari, Federico ;
Ilic, Slobodan ;
Navab, Nassir .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1530-1538
[9]   PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [J].
Kendall, Alex ;
Grimes, Matthew ;
Cipolla, Roberto .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2938-2946
[10]  
Kothari N, 2017, 2017 INDIAN CONTROL CONFERENCE (ICC), P424, DOI 10.1109/INDIANCC.2017.7846512