DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model

被引：35

作者：

Xu, Zhenhua ^{[1
]}

Zhang, Yujia ^{[2
]}

Xie, Enze ^{[3
]}

Zhao, Zhen ^{[4
]}

Guo, Yong ^{[3
]}

Wong, Kwan-Yee K. ^{[1
]}

Li, Zhenguo ^{[3
]}

Zhao, Hengshuang ^{[1
]}

机构：

[1] Univ Hong Kong, Hong Kong, Peoples R China

[2] Zhejiang Univ, Hangzhou 310027, Peoples R China

[3] Huawei Noahs Ark Lab, Montreal, PQ H3N 1X9, Canada

[4] Univ Sydney, Camperdown, NSW 2050, Australia

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Autonomous vehicles; Videos; Chatbots; Visualization; Cognition; Turning; Tuning; Autonomous driving; large language model;

D O I：

10.1109/LRA.2024.3440097

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Multimodallarge language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to-end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mix-finetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the fine-tuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V.

引用

页码：8186 / 8193

页数：8

共 53 条

[1] Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions [J].

Atakishiyev, Shahin ;

Salameh, Mohammad ;

Yao, Hengshuai ;

Goebel, Randy .

IEEE ACCESS, 2024, 12 :101603-101625

[2] Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [J].

Bain, Max ;

Nagrani, Arsha ;

Varol, Gul ;

Zisserman, Andrew .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :1708-1718

[3]

Bojarski M, 2016, Arxiv, DOI arXiv:1604.07316

[4]

Brohan A, 2023, Arxiv, DOI arXiv:2307.15818

[5] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[6]

ChatGPT OpneAI, 2023, about us

[7] End-to-End Autonomous Driving: Challenges and Frontiers [J].

Chen, Li ;

Wu, Penghao ;

Chitta, Kashyap ;

Jaeger, Bernhard ;

Geiger, Andreas ;

Li, Hongyang .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) :10164-10183

[8]

Chowdhery A, 2023, J MACH LEARN RES, V24

[9] Talk2Car: Taking Control of Your Self-Driving Car [J].

Deruyttere, Thierry ;

Vandenhende, Simon ;

Grujicic, Dusan ;

Van Gool, Luc ;

Moens, Marie-Francine .

2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, :2088-2098

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 6 →