6D Object Pose Estimation With Compact Generalized Non-Local Operation

被引：0

作者：

Jiang, Changhong ^{[1
]}

Mu, Xiaoqiao ^{[2
]}

Zhang, Bingbing ^{[3
]}

Liang, Chao ^{[4
]}

Xie, Mujun ^{[1
]}

机构：

[1] Changchun Univ Technol, Sch Elect & Elect Engn, Changchun 130012, Peoples R China

[2] Changchun Univ Technol, Sch Mech & Elect Engn, Changchun 130012, Peoples R China

[3] Dalian Minzu Univ, Sch Comp Sci & Engn, Dalian 116602, Peoples R China

[4] Changchun Univ Technol, Coll Comp Sci & Engn, Changchun 130012, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Pose estimation; Feature extraction; Three-dimensional displays; Training; Correlation; Predictive models; Computational modeling; Accuracy; Solid modeling; YOLO; Correlations; subtle feature; end-to-end; long-range spatiotemporal; fine-grained details; representational power;

D O I：

10.1109/ACCESS.2024.3508772

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bounding box vertices onto a 2D image, facilitating the estimation of the object's 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.

引用

页码：178080 / 178088

页数：9

共 27 条

[1] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image [J].

Brachmann, Eric ;

Michel, Frank ;

Krull, Alexander ;

Yang, Michael Ying ;

Gumhold, Stefan ;

Rother, Carsten .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3364-3372

[2]

Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35

[3]

Burdea P Coiffet G.C., 2003, VIRTUAL REALITY TECH

[4]

Deng XK, 2020, IEEE INT CONF ROBOT, P3665, DOI [10.1109/ICRA40945.2020.9196714, 10.1109/icra40945.2020.9196714]

[5]

Hinterstoisser S, 2012, LECT NOTES COMPUT SC, V7585, P593, DOI 10.1007/978-3-642-33885-4_60

[6]

Hinterstoisser S, 2011, IEEE I CONF COMP VIS, P858, DOI 10.1109/ICCV.2011.6126326

[7]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[8] SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again [J].

Kehl, Wadim ;

Manhardt, Fabian ;

Tombari, Federico ;

Ilic, Slobodan ;

Navab, Nassir .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1530-1538

[9] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [J].

Kendall, Alex ;

Grimes, Matthew ;

Cipolla, Roberto .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2938-2946

[10]

Kothari N, 2017, 2017 INDIAN CONTROL CONFERENCE (ICC), P424, DOI 10.1109/INDIANCC.2017.7846512

← 1 2 3 →