YOLO-NeRFSLAM: underwater object detection for the visual NeRF-SLAM

被引：0

作者：

Wang, Zhe ^{[1
]}

Yu, Zhibin ^{[1
,2
]}

Zheng, Bing ^{[1
,2
]}

机构：

[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao, Shandong, Peoples R China

[2] Ocean Univ China, Sanya Oceanog Inst, Key Lab Ocean Observat & Informat Hainan Prov, Sanya, Hainan, Peoples R China

来源：

FRONTIERS IN MARINE SCIENCE | 2025年 / 12卷

基金：

中国国家自然科学基金;

关键词：

visual SLAM; NeRF-SLAM; underwater SLAM; object detection; novel view reconstruction; TRACKING;

D O I：

10.3389/fmars.2025.1582126

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Accurate and reliable dense mapping is crucial for understanding and utilizing the marine environment in applications such as ecological monitoring, archaeological exploration, and autonomous underwater navigation. However, the underwater environment is highly dynamic: fish and floating debris frequently appear in the field of view, causing traditional SLAM to be easily disturbed during localization and mapping. In addition, common depth sensors and depth estimation techniques based on deep learning tend to be impractical or significantly less accurate underwater, failing to meet the demands of dense reconstruction. This paper proposes a new underwater SLAM framework that combines neural radiance fields (NeRF) with a dynamic masking module to address these issues. Through a Marine Motion Fusion (MMF) strategy-leveraging YOLO to detect known marine organisms and integrating optical flow for pixel-level motion analysis-we effectively screen out all dynamic objects, thus maintaining stable camera pose estimation and pixel-level dense reconstruction even without relying on depth data. Further, to cope with severe light attenuation and the dynamic nature of underwater scenes, we introduce specialized loss functions, enabling the reconstruction of underwater environments with realistic appearance and geometric detail even under high turbidity conditions. Experimental results show that our method significantly reduces localization drift caused by moving entities, improves dense mapping accuracy, and achieves favorable runtime efficiency in multiple real underwater video datasets, demonstrating both its potential and advanced capabilities in dynamic underwater settings.

引用

页数：18

共 45 条

[1] A Revised Underwater Image Formation Model [J].

Akkaynak, Derya ;

Treibitz, Tali .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6723-6732

[2] Consistency of the EKF-SLAM algorithm [J].

Bailey, Tim ;

Nieto, Juan ;

Guivant, Jose ;

Stevens, Michael ;

Nebot, Eduardo .

2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, :3562-+

[3] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [J].

Ben Mildenhall ;

Srinivasan, Pratul P. ;

Tancik, Matthew ;

Barron, Jonathan T. ;

Ramamoorthi, Ravi ;

Ng, Ren .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :405-421

[4] DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM [J].

Bescos, Berta ;

Campos, Carlos ;

Tardos, Juan D. ;

Neira, Jose .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) :5191-5198

[5] DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes [J].

Bescos, Berta ;

Facil, Jose M. ;

Civera, Javier ;

Neira, Jose .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04) :4076-4083

[6] ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM [J].

Campos, Carlos ;

Elvira, Richard ;

Gomez Rodriguez, Juan J. ;

Montiel, Jose M. M. ;

Tardos, Juan D. .

IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (06) :1874-1890

[7] Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping [J].

Chung, Chi-Ming ;

Tseng, Yang-Che ;

Hsu, Ya-Ching ;

Shi, Xiang-Qian ;

Hua, Yun-Hung ;

Yeh, Jia-Fong ;

Chen, Wen-Chin ;

Chen, Yi-Ting ;

Hsu, Winston H. .

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, :9400-9406

[8] LSD-SLAM: Large-Scale Direct Monocular SLAM [J].

Engel, Jakob ;

Schoeps, Thomas ;

Cremers, Daniel .

COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 :834-849

[9] Visually mapping the RMS Titanic: Conservative covariance estimates for SLAM information filters [J].

Eustice, Ryan M. ;

Singh, Hanumant ;

Leonard, John J. ;

Walter, Matthew R. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2006, 25 (12) :1223-1242

[10] AQUALOC: An underwater dataset for visual-inertial-pressure localization [J].

Ferrera, Maxime ;

Creuze, Vincent ;

Moras, Julien ;

Trouve-Peloux, Pauline .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2019, 38 (14) :1549-1559

← 1 2 3 4 5 →