INGRESS: Interactive visual grounding of referring expressions

被引：61

作者：

Shridhar, Mohit ^{[1
]}

Mittal, Dixant ^{[2
]}

Hsu, David ^{[2
]}

机构：

[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, 185 E Stevens Way NE, Seattle, WA 98195 USA

[2] Natl Univ Singapore, Sch Comp, Singapore, Singapore

来源：

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH | 2020年 / 39卷 / 2-3期

关键词：

Natural language grounding; disambiguation; human-robot interaction; LANGUAGE;

D O I：

10.1177/0278364919897133

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

This article presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The key question here is to ground referring expressions: understand expressions about objects and their relationships from image and natural language inputs. INGRESS allows unconstrained object categories and rich language expressions. Further, it asks questions to clarify ambiguous referring expressions interactively. To achieve these, we take the approach of grounding by generation and propose a two-stage neural-network model for grounding. The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expressions, and identifies a set of candidate objects. The second stage uses another neural network to examine all pairwise relations between the candidates and infers the most likely referred objects. The same neural networks are used for both grounding and question generation for disambiguation. Experiments show that INGRESS outperformed a state-of-the-art method on the RefCOCO dataset and in robot experiments with humans. The INGRESS source code is available at https://github.com/MohitShridhar/ingress.

引用

页码：217 / 232

页数：16

共 56 条

[1] Neural Module Networks
Andreas, Jacob
Rohrbach, Marcus
Darrell, Trevor
Klein, Dan
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 39 - 48
[2] [Anonymous], P COMP VIS PATT REC
[3] [Anonymous], IEEE INT S ROB HUM I
[4] [Anonymous], IEEE RSJ INT C INT R
[5] [Anonymous], IEEE RSJ INT C INT R
[6] [Anonymous], 2013, Transactions of the Association for Computational Linguistics, DOI DOI 10.1162/TACLA00220
[7] Multiscale Combinatorial Grouping
Arbelaez, Pablo
Pont-Tuset, Jordi
Barron, Jonathan T.
Marques, Ferran
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 328 - 335
[8] ARKIN J, 2018, SIGDIAL SPEC SESS PH
[9] Bhattacharya B, 2017, JOINT INT CONF SOFT
[10] Bisk Y., 2016, P 2016 C N AM CHAPT, P751, DOI DOI 10.18653/V1/N16-1089

← 1 2 3 4 5 6 →