Monocular Object-Level SLAM Enhanced by Joint Semantic Segmentation and Depth Estimation

被引：0

作者：

Gao, Ruicheng ^{[1
]}

Qi, Yue ^{[1
,2
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China

[2] Beihang Univ, Qingdao Res Inst, Qingdao 266104, Peoples R China

来源：

SENSORS | 2025年 / 25卷 / 07期

基金：

中国国家自然科学基金;

关键词：

object-level SLAM; semantic segmentation; depth estimation; multi-task learning; TRACKING;

D O I：

10.3390/s25072110

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances. However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes. Moreover, high-level semantic information can always contribute to the SLAM process due to its similarity to human vision. Addressing these problems, we propose a monocular object-level SLAM system enhanced by real-time joint depth estimation and semantic segmentation. The multi-task network, called JSDNet, is designed to predict depth and semantic segmentation simultaneously, with four contributions that include depth discretization, feature fusion, a weight-learned loss function, and semantic consistency optimization. Specifically, feature fusion facilitates the sharing of features between the two tasks, while semantic consistency aims to guarantee the semantic segmentation and depth consistency among various views. Based on the results of JSDNet, we design an object-level system that combines both pixel-level and object-level semantics with traditional tracking, mapping, and optimization processes. In addition, a scale recovery process is also integrated into the system to evaluate the truth scale. Experimental results on NYU depth v2 demonstrate state-of-the-art depth estimation and considerable segmentation precision under real-time performance, while the trajectory accuracy on TUM RGB-D shows less errors compared with other SLAM systems.

引用

页数：15

共 50 条

[1] DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM [J].

Bescos, Berta ;

Campos, Carlos ;

Tardos, Juan D. ;

Neira, Jose .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) :5191-5198

[2] DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes [J].

Bescos, Berta ;

Facil, Jose M. ;

Civera, Javier ;

Neira, Jose .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04) :4076-4083

[3] ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM [J].

Campos, Carlos ;

Elvira, Richard ;

Gomez Rodriguez, Juan J. ;

Montiel, Jose M. M. ;

Tardos, Juan D. .

IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (06) :1874-1890

[4] OTE-SLAM: An Object Tracking Enhanced Visual SLAM System for Dynamic Environments [J].

Chang, Yimeng ;

Hu, Jun ;

Xu, Shiyou .

SENSORS, 2023, 23 (18)

[5] SG-SLAM: A Real-Time RGB-D Visual SLAM Toward Dynamic Scenes With Semantic and Geometric Information [J].

Cheng, Shuhong ;

Sun, Changhe ;

Zhang, Shijun ;

Zhang, Dianfan .

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72

[6] Accurate Dynamic SLAM Using CRF-Based Long-Term Consistency [J].

Du, Zheng-Jun ;

Huang, Shi-Sheng ;

Mu, Tai-Jiang ;

Zhao, Qunhe ;

Martin, Ralph R. ;

Xu, Kun .

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (04) :1745-1757

[7] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

[8] Recovering Stable Scale in Monocular SLAM Using Object-Supplemented Bundle Adjustment [J].

Frost, Duncan ;

Prisacariu, Victor ;

Murray, David .

IEEE TRANSACTIONS ON ROBOTICS, 2018, 34 (03) :736-747

[9] CI-Net: a joint depth estimation and semantic segmentation network using contextual information [J].

Gao, Tianxiao ;

Wei, Wu ;

Cai, Zhongbin ;

Fan, Zhun ;

Xie, Sheng Quan ;

Wang, Xinmei ;

Yu, Qiuda .

APPLIED INTELLIGENCE, 2022, 52 (15) :18167-18186

[10] SOSD-Net: Joint semantic object segmentation and depth estimation from monocular images [J].

He, Lei ;

Lu, Jiwen ;

Wang, Guanghui ;

Song, Shiyu ;

Zhou, Jie .

NEUROCOMPUTING, 2021, 440 (440) :251-263

← 1 2 3 4 5 →