OTOT: An Online Training and Offline Testing System for 6D Object Pose Estimation

被引:0
作者
Yuan, Yilin [1 ]
Jiang, Qian [1 ]
Mu, Quan [2 ]
Jia, Wenchao [1 ]
Fu, Boya [1 ]
He, Renzhi [1 ]
Wen, Jian [3 ]
Liu, Fei [1 ]
Mao, Qin [4 ,5 ]
Zhou, Mingliang [6 ]
机构
[1] Chongqing Univ, Sch Mech & Vehicle Engn, State Key Lab Mech Transmiss, Chongqing 400000, Peoples R China
[2] Minist Ecol & Environm, Foreign Environm Cooperat Ctr, Beijing 100035, Peoples R China
[3] Powerchina Sichuan Elect Power Engn Co Ltd, Chengdu 610041, Sichuan, Peoples R China
[4] Qiannan Normal Univ Nationalities, Sch Comp & Informat Technol, Duyun 558000, Peoples R China
[5] Key Lab Complex Syst & Intelligent Optimizat Guizh, Duyun 558000, Peoples R China
[6] Chongqing Univ, Coll Comp Sci, Chongqing 400000, Peoples R China
基金
中国国家自然科学基金;
关键词
Supervised online learning system; object pose estimation; convolutional neural networks; computer vision; ACCURATE; NETWORK;
D O I
10.1142/S0218001423510151
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a novel system to assist 6D object pose estimation network training, which is only deployed in the training progress to optimize the network parameters, and does not work in the testing stage, called Online training and offline testing system (OTOT). OTOT consists of two modules: a feature fusion module and a supervision module. The feature fusion module fuses several feature maps from the pose estimation network in a specified order to obtain a fused feature. Then, the supervision module uses the encoder-decoder structure network to implicitly extract useful features from the fused feature and optimizes the pose estimation network online through the back-propagation mechanism. OTOT can be migrated to any network with encoder-decoder structure. The network trained with OTOT achieves 56.11% accuracy in terms of the VSD metric on the TLESS dataset using RGB inputs, compared to the 46.70% accuracy of the original network trained without OTOT. Experiments show that OTOT greatly improves the accuracy of the pose estimation network, and since OTOT is not deployed in the testing stage, it does not increase any parameters during testing and affect the original speed of the network.
引用
收藏
页数:19
相关论文
共 42 条
[1]  
Ammirato P, 2020, IEEE WINT CONF APPL, P1657, DOI [10.1109/WACV45572.2020.9093450, 10.1109/wacv45572.2020.9093450]
[2]  
Bay H., 2006, P COMPUTER VISIONECC, V404, P417
[3]  
Bukschat Y, 2020, Arxiv, DOI [arXiv:2011.04307, DOI 10.48550/ARXIV.2011.04307]
[4]   G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features [J].
Chen, Wei ;
Jia, Xi ;
Chang, Hyung Jin ;
Duan, Jinming ;
Leonardis, Ales .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4232-4241
[5]  
Chen W, 2020, IEEE WINT CONF APPL, P2813, DOI [10.1109/wacv45572.2020.9093272, 10.1109/WACV45572.2020.9093272]
[6]  
Deng XK, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV
[7]   SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [J].
Di, Yan ;
Manhardt, Fabian ;
Wang, Gu ;
Ji, Xiangyang ;
Navab, Nassir ;
Tombari, Federico .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12376-12385
[8]   RANDOM SAMPLE CONSENSUS - A PARADIGM FOR MODEL-FITTING WITH APPLICATIONS TO IMAGE-ANALYSIS AND AUTOMATED CARTOGRAPHY [J].
FISCHLER, MA ;
BOLLES, RC .
COMMUNICATIONS OF THE ACM, 1981, 24 (06) :381-395
[9]  
Ghiasi G, 2018, ADV NEUR IN, V31
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778