Talk2Car: Taking Control of Your Self-Driving Car

被引:39
作者
Deruyttere, Thierry [1 ]
Vandenhende, Simon [2 ]
Grujicic, Dusan [2 ]
Van Gool, Luc [2 ]
Moens, Marie-Francine [1 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci CS, Leuven, Belgium
[2] Katholieke Univ Leuven, Dept Elect Engn ESAT, Leuven, Belgium
来源
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE | 2019年
关键词
D O I
10.18653/v1/d19-1215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A long-term goal of artificial intelligence is to have an agent execute commands communicated through natural language. In many cases the commands are grounded in a visual environment shared by the human who gives the command and the agent. Execution of the command then requires mapping the command into the physical visual space, after which the appropriate action can be taken. In this paper we consider the former. Or more specifically, we consider the problem in an autonomous driving setting, where a passenger requests an action that can be associated with an object found in a street scene. Our work presents the Talk2Car dataset, which is the first object referral dataset that contains commands written in natural language for self-driving cars. We provide a detailed comparison with related datasets such as ReferIt, RefCOCO, RefCOCO+, RefCOCOg, Cityscape-Ref and CLEVR-Ref. Additionally, we include a performance analysis using strong state-ofthe-art models. The results show that the proposed object referral task is a challenging one for which the models show promising results but still require additional research in natural language processing, computer vision and the intersection of these fields. The dataset can be found on our website: http:// macchina-ai.eu/
引用
收藏
页码:2088 / 2098
页数:11
相关论文
共 40 条
[1]  
Anderson Peter, 2017, VISION LANGUAGE NAVI
[2]  
[Anonymous], Simple baseline for visual question answering
[3]  
[Anonymous], 2018, IEEE INT C ROB AUT I
[4]  
Berg T., 2014, EMNLP, P787
[5]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[6]   TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments [J].
Chen, Howard ;
Suhr, Alane ;
Misra, Dipendra ;
Snavely, Noah ;
Artzi, Yoav .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12530-12539
[7]  
Cirik Volkan, 2018, ARXIV180511818
[8]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[9]  
Das Abhishek, 2017, Embodied question answering
[10]  
de Vries Harm, 2018, arXiv