Self-Supervised Visual Odometry Based on Scene Appearance-Structure Incremental Fusion

被引：0

作者：

Fu, Fuji ^{[1
]}

Yang, Jinfu ^{[2
,3
]}

Ma, Jiaqi ^{[1
]}

Zhang, Jiahui ^{[1
]}

机构：

[1] Beijing Univ Sci & Technol, Sch Informat Engn, Beijing 100124, Peoples R China

[2] Beijing Univ Technol, Sch Informat Sci & Technol, Beijing 100124, Peoples R China

[3] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2025年 / 26卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Pose estimation; Depth measurement; Cameras; Visual odometry; Vectors; Self-supervised learning; Training; Robustness; Intelligent transportation systems; Transformers; Self-supervised visual odometry; pose estimation; depth estimation; cross-modal incremental fusion;

D O I：

10.1109/TITS.2025.3559077

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Self-supervised visual odometry (VO) has exhibited remarkable benefits over supervised methods, surpassing the reliance on the annotated ground-truth of training data. However, most existing self-supervised VO methods, namely scene appearance-based methods, have limitations in exploiting the complementary properties of cross-modal information between scene appearance and structure. To this end, we propose a novel self-supervised VO based on scene appearance-structure incremental fusion scheme. Specifically, a Global-Local Context awareness-based Depth estimation Network (GLC-DN) is designed to introduce the scene structural cues, thus laying the foundation for realizing the scene appearance-structure incremental fusion. Then, a Dual stream Pose estimation Network based on Scene Appearance-Structure Incremental Fusion (SASIF-DPN) is devised, which consists of a Dual Stream Network (DSN) and multiple Cross-Modal Complementary Fusion Modules (CM-CFMs). CM-CFM fully leverages the complementary properties between the RGB information and the predicted depth information, and the combination of multiple CM-CFMs facilitates the information interaction between the two modalities in an incremental fusion manner. Detailed evaluations of GLC-DN and SASIF-DPN provably confirm the effectiveness and design principles of each component we propose. Extensive comparison experiments have also been conducted, which clearly verify the superiority of our method compared to current counterparts.

引用

页码：8006 / 8020

页数：15

共 51 条

[11] Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [J].

Feng, Di ;

Haase-Schutz, Christian ;

Rosenbaum, Lars ;

Hertlein, Heinz ;

Glaser, Claudius ;

Timm, Fabian ;

Wiesbeck, Werner ;

Dietmayer, Klaus .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (03) :1341-1360

[12]

Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074

[13] Digging Into Self-Supervised Monocular Depth Estimation [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Firman, Michael ;

Brostow, Gabriel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837

[14] Unsupervised Monocular Depth Estimation with Left-Right Consistency [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Brostow, Gabriel J. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6602-6611

[15] 3D Packing for Self-Supervised Monocular Depth Estimation [J].

Guizilini, Vitor ;

Ambrus, Rares ;

Pillai, Sudeep ;

Raventos, Allan ;

Gaidon, Adrien .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2482-2491

[16] BASL-AD SLAM: A Robust Deep-Learning Feature-Based Visual SLAM System With Adaptive Motion Model [J].

Han, Junyu ;

Dong, Ruifang ;

Kan, Jiangming .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (09) :11794-11804

[17] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[18]

Jaderberg M, 2015, ADV NEUR IN, V28

[19] SimVODIS: Simultaneous Visual Odometry, Object Detection, and Instance Segmentation [J].

Kim, Ue-Hwan ;

Kim, Se-Ho ;

Kim, Jong-Hwan .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) :428-441

[20] Generalizing to the Open World: Deep Visual Odometry with Online Adaptation [J].

Li, Shunkai ;

Wu, Xin ;

Cao, Yingdian ;

Zha, Hongbin .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13179-13188

← 1 2 3 4 5 6 →