Attention-SLAM: A Visual Monocular SLAM Learning From Human Gaze

被引：34

作者：

Li, Jinquan ^{[1
]}

Pei, Ling ^{[1
]}

Zou, Danping ^{[1
]}

Xia, Songpengcheng ^{[1
]}

Wu, Qi ^{[1
]}

Li, Tao ^{[1
]}

Sun, Zhen ^{[1
]}

Yu, Wenxian ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai Key Lab Nav & Locat Based Serv, Shanghai 200240, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2021年 / 21卷 / 05期

关键词：

Simultaneous localization and mapping; Visualization; Semantics; Feature extraction; Data mining; Predictive models; Adaptation models; Visual sailency; monocular visual semantic SLAM; weighted bundle adjustment; SIMULTANEOUS LOCALIZATION; ODOMETRY;

D O I：

10.1109/JSEN.2020.3038432

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM, which simulates human navigation mode by combining a visual saliency model (SalNavNet) with traditional monocular visual SLAM. Firstly a visual saliency model namely SalNavNet is proposed. In SalNavNet, we introduce a correlation module and propose an adaptive Exponential Moving Average (EMA) module. These modules mitigate the center bias, which most current saliency models have. This novel idea enables the saliency maps generated by SalNavNet to pay more attention to the same salient object. An open-source saliency SLAM dataset namely Salient-Euroc is published, it consists of Euroc dataset and corresponding saliency maps. Moreover, we propose a new optimization method called Weighted Bundle Adjustment (Weighted BA) in Attention-SLAM. Most SLAM methods treat all the features extracted from the images as equal importance during the optimization process. In weighted BA, the feature points extracted from the salient regions have greater importance. Comprehensive test results prove that Attention-SLAM outperforms benchmarks such as Direct Sparse Odometry (DSO), ORB-SLAM, and Salient DSO in the 7 of 11 test cases. The test cases are all indoor scenes, with varying brightness, speed, and image distortion. Compared with ORB-SLAM, our method improves the accuracy by 4% and efficiency by 6.5% on average.

引用

页码：6408 / 6420

页数：13

共 54 条

[21] End-to-End Saliency Mapping via Probability Distribution Prediction
Jetley, Saumya
Murray, Naila
Vig, Eleonora
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5753 - 5761
[22] Jiang L., 2017, ARXIV170906316
[23] Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation
Kaneko, Masaya
Iwami, Kazuya
Ogawa, Toru
Yamasaki, Toshihiko
Aizawa, Kiyoharu
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 371 - 379
[24] Kerl C, 2013, IEEE INT C INT ROBOT, P2100, DOI 10.1109/IROS.2013.6696650
[25] King DB, 2015, ACS SYM SER, V1214, P1
[26] Klein George, 2007, P1
[27] DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations
Kruthiventi, Srinivas S. S.
Ayush, Kumar
Babu, R. Venkatesh
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (09) : 4446 - 4456
[28] Li B., 2019, ARXIV191205002
[29] SalientDSO: Bringing Attention to Direct Sparse Odometry
Liang, Huai-Jen
Sanket, Nitin J.
Fermuller, Cornelia
Aloimonos, Yiannis
[J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2019, 16 (04) : 1619 - 1626
[30] VSO: Visual Semantic Odometry
Lianos, Konstantinos-Nektarios
Schoenberger, Johannes L.
Pollefeys, Marc
Sattler, Torsten
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 246 - 263

← 1 2 3 4 5 6 →