Multi-Modal Place Recognition via Vectorized HD Maps and Images Fusion for Autonomous Driving

被引:1
作者
Jeong, Hyeonjun [1 ]
Shin, Juyeb [2 ]
Rameau, Francois [3 ]
Kum, Dongsuk [1 ]
机构
[1] Grad Sch Mobil, KAIST, Daejeon 34051, South Korea
[2] Korea Adv Inst Sci & Technol, Robot Program, Daejeon 34141, South Korea
[3] SUNY Korea, Dept Comp Sci, Incheon 21985, South Korea
关键词
Visualization; Semantics; Feature extraction; Autonomous vehicles; Image recognition; Location awareness; Roads; Autonomous vehicle navigation; localization; deep learning for visual perception;
D O I
10.1109/LRA.2024.3374193
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
The deployment of autonomous vehicles and mobile robots requires light, fast, and robust visual place recognition strategies. While visual place recognition has proven effective in favorable conditions, its performance quickly drops when faced with abundant visual cues, such as repeating image patterns commonly found in driving environments. To address this problem, a new representation that incorporates geometric cues with structural semantics can also be utilized to find the position of an agent to distribute the reliance on visual cues. In this letter, we present the first multi-modal place recognition for autonomous driving that utilizes both images and vectorized HD maps. The vectorized HD maps have the advantage of being lightweight and providing geometric cues with structural semantics, making them particularly well-suited for place recognition. To accomplish this, we employ a hierarchical graph neural network to extract a compact and robust descriptor from a local vectorized map that can be captured from surrounding images. Although HD maps provide concise geometric cues with structural semantics, they sometimes do not provide sufficient features for place recognition, contrary to images. To cope with this limitation, we propose to adaptively fuse both descriptors extracted from maps and images in order to combine the best complementary aspects of each modality via a transformer-based solution. Extensive experiments on large-scale driving datasets, NuScenes and Argoverse2, demonstrate that our multi-modal visual localization outperforms visual-only approaches. Specifically, ours improves the baseline up to 6.48%p in Recall@1 with less than 10 ms additional computation.
引用
收藏
页码:4710 / 4717
页数:8
相关论文
共 42 条
[1]  
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[2]   Robust Road Marking Detection and Recognition Using Density-Based Grouping and Machine Learning Techniques [J].
Bailo, Oleksandr ;
Lee, Seokju ;
Rameau, Francois ;
Yoon, Jae Shin ;
Kweon, In So .
2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, :760-768
[3]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[4]   Unifying Deep Local and Global Features for Image Search [J].
Cao, Bingyi ;
Araujo, Andre ;
Sim, Jack .
COMPUTER VISION - ECCV 2020, PT XX, 2020, 12365 :726-743
[5]  
Casas S, 2018, PR MACH LEARN RES, V87
[6]   Road Mapping and Localization Using Sparse Semantic Visual Features [J].
Cheng, Wentao ;
Yang, Sheng ;
Zhou, Maomin ;
Liu, Ziyuan ;
Chen, Yiming ;
Li, Mingyang .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) :8118-8125
[7]  
Choe J, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV
[8]  
Csurka G., 2004, WORKSHOP STAT LEARNI, V1, P1
[9]  
Cui HG, 2019, IEEE INT CONF ROBOT, P2090, DOI [10.1109/icra.2019.8793868, 10.1109/ICRA.2019.8793868]
[10]   VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation [J].
Gao, Jiyang ;
Sun, Chen ;
Zhao, Hang ;
Shen, Yi ;
Anguelov, Dragomir ;
Li, Congcong ;
Schmid, Cordelia .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11522-11530