Self-Supervised Visual Odometry Based on Scene Appearance-Structure Incremental Fusion

被引:0
作者
Fu, Fuji [1 ]
Yang, Jinfu [2 ,3 ]
Ma, Jiaqi [1 ]
Zhang, Jiahui [1 ]
机构
[1] Beijing Univ Sci & Technol, Sch Informat Engn, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Sch Informat Sci & Technol, Beijing 100124, Peoples R China
[3] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Pose estimation; Depth measurement; Cameras; Visual odometry; Vectors; Self-supervised learning; Training; Robustness; Intelligent transportation systems; Transformers; Self-supervised visual odometry; pose estimation; depth estimation; cross-modal incremental fusion;
D O I
10.1109/TITS.2025.3559077
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Self-supervised visual odometry (VO) has exhibited remarkable benefits over supervised methods, surpassing the reliance on the annotated ground-truth of training data. However, most existing self-supervised VO methods, namely scene appearance-based methods, have limitations in exploiting the complementary properties of cross-modal information between scene appearance and structure. To this end, we propose a novel self-supervised VO based on scene appearance-structure incremental fusion scheme. Specifically, a Global-Local Context awareness-based Depth estimation Network (GLC-DN) is designed to introduce the scene structural cues, thus laying the foundation for realizing the scene appearance-structure incremental fusion. Then, a Dual stream Pose estimation Network based on Scene Appearance-Structure Incremental Fusion (SASIF-DPN) is devised, which consists of a Dual Stream Network (DSN) and multiple Cross-Modal Complementary Fusion Modules (CM-CFMs). CM-CFM fully leverages the complementary properties between the RGB information and the predicted depth information, and the combination of multiple CM-CFMs facilitates the information interaction between the two modalities in an incremental fusion manner. Detailed evaluations of GLC-DN and SASIF-DPN provably confirm the effectiveness and design principles of each component we propose. Extensive comparison experiments have also been conducted, which clearly verify the superiority of our method compared to current counterparts.
引用
收藏
页码:8006 / 8020
页数:15
相关论文
共 51 条
[1]  
Ambrus R., 2020, P C ROB LEARN, P1052
[2]   Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry [J].
Antsfeldi, Leonid ;
Chidlovskii, Boris .
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, :14669-14676
[3]  
Bae Jaehyeok, 2022, ARXIV
[4]   Auto-Rectify Network for Unsupervised Indoor Depth Estimation [J].
Bian, Jia-Wang ;
Zhan, Huangying ;
Wang, Naiyan ;
Chin, Tat-Jun ;
Shen, Chunhua ;
Reid, Ian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :9802-9813
[5]  
Bian JW, 2019, ADV NEUR IN, V32
[6]   Unsupervised Scale-Consistent Depth Learning from Video [J].
Bian, Jia-Wang ;
Zhan, Huangying ;
Wang, Naiyan ;
Li, Zhichao ;
Zhang, Le ;
Shen, Chunhua ;
Cheng, Ming-Ming ;
Reid, Ian .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2548-2564
[7]   Multimodal Object Detection by Channel Switching and Spatial Attention [J].
Cao, Yue ;
Bin, Junchi ;
Hamari, Jozsef ;
Blasch, Erik ;
Liu, Zheng .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, :403-411
[8]   Deep Learning for Visual Localization and Mapping: A Survey [J].
Chen, Changhao ;
Wang, Bing ;
Lu, Chris Xiaoxuan ;
Trigoni, Niki ;
Markham, Andrew .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) :17000-17020
[9]   Multimodal Object Detection via Probabilistic Ensembling [J].
Chen, Yi-Ting ;
Shi, Jinghao ;
Ye, Zelin ;
Mertz, Christoph ;
Ramanan, Deva ;
Kong, Shu .
COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 :139-158
[10]   Recalibrating the KITTI Dataset Camera Setup for Improved Odometry Accuracy [J].
Cvisic, Igor ;
Markovic, Ivan ;
Petrovic, Ivan .
10TH EUROPEAN CONFERENCE ON MOBILE ROBOTS (ECMR 2021), 2021,