HI-SLAM: Hierarchical implicit neural representation for SLAM

被引:0
作者
Li, Jingbo [1 ]
Firkat, Eksan [4 ,5 ]
Zhu, Jingyu [3 ]
Zhu, Bin [2 ]
Zhu, Jihong [3 ]
Hamdulla, Askar [1 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, 666 Shengli Rd, Urumqi, Xinjiang, Peoples R China
[2] Tsinghua Univ, Dept Automat, 33 Shuangqing Rd, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Precis Instrument, 33 Shuangqing Rd, Beijing, Peoples R China
[4] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[5] Great Bay Univ, Dongguan, Guangdong, Peoples R China
关键词
Dense visual SLAM; Neural implicit representations; Localization; RGB-D camera; FEATURE FUSION; VERSATILE;
D O I
10.1016/j.eswa.2025.126487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Implicit neural representation can improve the expressive ability and performance of the model by learning the representation of high-dimensional feature space and has a wide range of applications in many fields and an exciting performance. Dense visual SLAM is one of the beneficiaries of the development of implicit neural representations. Still, the current methods are based on simple fully connected network architectures, resulting in poor generalization ability, insufficient real-time performance and inability to balance global and local optimization. This paper propose a hierarchical scene representation that treats color information and geometric information as equally important, one that encodes geometric and color information into different resolution grid sizes and combines multiple corresponding multi-layer perceptron decoders. The coarse-level grid captures the general shape and structure of the global scene and makes reasonable predictions for unobserved regions.In contrast, the medium-fine-level grid finely represents geometric details and color information. Rich and comprehensive high-fidelity reconstructions can be obtained in large-scale scenes by using meshes of different resolutions to encode geometric and color information. In this study, selectable keyframes are used to ensure that the local information of the scene is optimized while reducing redundant information preservation. Compared with recent dense visual SLAM systems via implicit neural representations, our method generalizes and operates more robustly, efficiently, and precisely in large-scale scenes.
引用
收藏
页数:10
相关论文
共 35 条
[31]   NeurNCD: Novel Class Discovery via Implicit Neural Representation [J].
Wang, Junming ;
Shi, Yi .
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, :257-265
[32]   AN END-TO-END SIAMESE CONVOLUTIONAL NEURAL NETWORK FOR LOOP CLOSURE DETECTION IN VISUAL SLAM SYSTEM [J].
Liu, Hong ;
Zhao, Chenyang ;
Huang, Weipeng ;
Shi, Wei .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :3121-3125
[33]   Monocular Visual SLAM With Adjusting Neural Radiance Fields for 3-D Reconstruction in Planetary Environments [J].
Huang, Rong ;
Liu, Chen ;
Xie, Huan ;
Yu, Jiyang ;
Tao, Tao ;
Xu, Yusheng ;
Ye, Zhen ;
Tong, Xiaohua .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[34]   3D Convolutional Neural Network for Low-Light Image Sequence Enhancement in SLAM [J].
Quan, Yizhuo ;
Fu, Dong ;
Chang, Yuanfei ;
Wang, Chengbo .
REMOTE SENSING, 2022, 14 (16)
[35]   Fast and robust loop-closure detection using deep neural networks and matrix transformation for a visual SLAM system [J].
Chen, Yan ;
Zhong, Yang ;
Wang, Wenxiang ;
Peng, Hongxing .
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06) :61816