Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems

被引:40
作者
Dong, Jiqian [1 ,2 ]
Chen, Sikai [3 ]
Miralinaghi, Mohammad [4 ]
Chen, Tiantian [5 ]
Li, Pei [3 ]
Labi, Samuel [1 ,2 ]
机构
[1] Purdue Univ, Ctr Connected & Automated Transportat CCAT, W Lafayette, IN USA
[2] Purdue Univ, Lyles Sch Civil Engn, W Lafayette, IN USA
[3] Univ Wisconsin Madison, Dept Civil & Environm Engn, Madison, WI 53706 USA
[4] IIT, Dept Civil Architectural & Environm Engn, Chicago, IL USA
[5] Korea Adv Inst Sci & Technol, Cho Chun Shik Grad Sch Mobil, Daejeon, South Korea
关键词
Explainable AI (XAI); Autonomous driving; User trust; Computer vision; End -to -end transformer; Visual attention; NEURAL-NETWORK; ARCHITECTURE; ATTENTION; VISION;
D O I
10.1016/j.trc.2023.104358
中图分类号
U [交通运输];
学科分类号
08 ; 0823 ;
摘要
User trust has been identified as a critical issue that is pivotal to the success of autonomous vehicle (AV) operations where artificial intelligence (AI) is widely adopted. For such integrated AI-based driving systems, one promising way of building user trust is through the concept of explainable artificial intelligence (XAI) which requires the AI system to provide the user with the explanations behind each decision it makes. Motivated by both the need to enhance user trust and the promise of novel XAI technology in addressing such need, this paper seeks to enhance trustworthiness in autonomous driving systems through the development of explainable Deep Learning (DL) models. First, the paper casts the decision-making process of the AV system not as a classification task (which is the traditional process) but rather as an image-based language generation (image captioning) task. As such, the proposed approach makes driving decisions by first generating textual descriptions of the driving scenarios, which serve as explanations that humans can understand. To this end, a novel multi-modal DL architecture is proposed to jointly model the correlation between an image (driving scenario) and language (descriptions). It adopts a fully Transformer-based structure and therefore has the potential to perform global attention and imitate effectively, the learning processes of human drivers. The results suggest that the proposed model can and does generate legal and meaningful sentences to describe a given driving scenario, and subsequently to correctly generate appropriate driving decisions in autonomous vehicles (AVs). It is also observed that the proposed model significantly outperforms multiple baseline models in terms of generating both explanations and driving actions. From the end user's perspective, the proposed model can be beneficial in enhancing user trust because it provides the rationale behind an AV's actions. From the AV developer's perspective, the explanations from this explainable system could serve as a "debugging" tool to detect potential weaknesses in the existing system and identify specific directions for improvement.
引用
收藏
页数:19
相关论文
共 68 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]  
Atakishiyev S., 2021, Explainable artificial intelligence for autonomous driving: a comprehensive overview and field guide for future research directions
[3]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]
[4]   Driving behavior explanation with multi-level fusion [J].
Ben-Younes, Hedi ;
Zablocki, Eloi ;
Perez, Patrick ;
Cord, Matthieu .
PATTERN RECOGNITION, 2022, 123
[5]  
Bojarski Mariusz, 2016, arXiv
[6]   Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles [J].
Chen, Sikai ;
Dong, Jiqian ;
Ha, Paul ;
Li, Yujie ;
Labi, Samuel .
COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2021, 36 (07) :838-857
[7]   A deep learning algorithm for simulating autonomous driving considering prior knowledge and temporal information [J].
Chen, Sikai ;
Leng, Yue ;
Labi, Samuel .
COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2020, 35 (04) :305-321
[8]   Learning to Evaluate Image Captioning [J].
Cui, Yin ;
Yang, Guandao ;
Veit, Andreas ;
Huang, Xun ;
Belongie, Serge .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5804-5812
[9]   A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to AI-guided driving policy learning [J].
Di, Xuan ;
Shi, Rongye .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 125 (125)
[10]   An effective spatial-temporal attention based neural network for traffic flow prediction [J].
Do, Loan N. N. ;
Vu, Hai L. ;
Vo, Bao Q. ;
Liu, Zhiyuan ;
Dinh Phung .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2019, 108 :12-28