Semantic-geometric visual place recognition: a new perspective for reconciling opposing views

被引：66

作者：

Garg, Sourav ^{[1
]}

Suenderhauf, Niko ^{[1
]}

Milford, Michael ^{[1
]}

机构：

[1] Queensland Univ Technol, Australian Ctr Robot Vis, Brisbane, Qld, Australia

来源：

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH | 2022年 / 41卷 / 06期

基金：

澳大利亚研究理事会;

关键词：

Visual place recognition; visual localization; deep learning; semantic; MENTAL ROTATION; LARGE-SCALE; LOCALIZATION; SLAM;

D O I：

10.1177/0278364919839761

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Human drivers are capable of recognizing places from a previous journey even when viewing them from the opposite direction during the return trip under radically different environmental conditions, without needing to look back or employ a 360 degrees camera or LIDAR sensor. Such navigation capabilities are attributed in large part to the robust semantic scene understanding capabilities of humans. However, for an autonomous robot or vehicle, achieving such human-like visual place recognition capability presents three major challenges: (1) dealing with a limited amount of commonly observable visual content when viewing the same place from the opposite direction; (2) dealing with significant lateral viewpoint changes caused by opposing directions of travel taking place on opposite sides of the road; and (3) dealing with a radically changed scene appearance due to environmental conditions such as time of day, season, and weather Current state-of-the-art place recognition systems have only addressed these three challenges in isolation or in pairs, typically relying on appearance-based, deep-learnt place representations. In this paper, we present a novel, semantics-based system that for the first time solves all three challenges simultaneously. We propose a hybrid image descriptor that semantically aggregates salient visual information, complemented by appearance-based description, and augment a conventional coarse-to-fine recognition pipeline with keypoint correspondences extracted from within the convolutional feature maps of a pre-trained network. Finally, we introduce descriptor normalization and local score enhancement strategies for improving the robustness of the system. Using both existing benchmark datasets and extensive new datasets that for the first time combine the three challenges of opposing viewpoints, lateral viewpoint shifts, and extreme appearance change, we show that our system can achieve practical place recognition performance where existing state-of-the-art methods fail.

引用

页码：573 / 598

页数：26

共 110 条

[1]

Ackermann E.K., 1996, CONSTRUCTIONISM PRAC, P25

[2]

[Anonymous], 2016, INT C LEARNING REPRE

[3]

[Anonymous], 2014, AUSTRALASIAN C ROBOT

[4]

[Anonymous], 2010, P ACM INT C IMAGE VI

[5]

Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]

[6] Visual Vocabulary with a Semantic Twist [J].

Arandjelovic, Relja ;

Zisserman, Andrew .

COMPUTER VISION - ACCV 2014, PT I, 2015, 9003 :178-195

[7]

Ardeshir S, 2014, LECT NOTES COMPUT SC, V8694, P602, DOI 10.1007/978-3-319-10599-4_39

[8]

Arroyo R, 2014, IEEE INT VEH SYM, P1378, DOI 10.1109/IVS.2014.6856457

[9] Localization from semantic observations via the matrix permanent [J].

Atanasov, Nikolay ;

Zhu, Menglong ;

Daniilidis, Kostas ;

Pappas, George J. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3) :73-99

[10] Speeded-Up Robust Features (SURF) [J].

Bay, Herbert ;

Ess, Andreas ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :346-359

← 1 2 3 4 5 6 7 8 9 10 →