Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems

被引:27
作者
Dong, Jiqian [1 ,2 ]
Chen, Sikai [3 ]
Miralinaghi, Mohammad [4 ]
Chen, Tiantian [5 ]
Li, Pei [3 ]
Labi, Samuel [1 ,2 ]
机构
[1] Purdue Univ, Ctr Connected & Automated Transportat CCAT, W Lafayette, IN USA
[2] Purdue Univ, Lyles Sch Civil Engn, W Lafayette, IN USA
[3] Univ Wisconsin Madison, Dept Civil & Environm Engn, Madison, WI 53706 USA
[4] IIT, Dept Civil Architectural & Environm Engn, Chicago, IL USA
[5] Korea Adv Inst Sci & Technol, Cho Chun Shik Grad Sch Mobil, Daejeon, South Korea
关键词
Explainable AI (XAI); Autonomous driving; User trust; Computer vision; End -to -end transformer; Visual attention; NEURAL-NETWORK; ARCHITECTURE; ATTENTION; VISION;
D O I
10.1016/j.trc.2023.104358
中图分类号
U [交通运输];
学科分类号
08 ; 0823 ;
摘要
User trust has been identified as a critical issue that is pivotal to the success of autonomous vehicle (AV) operations where artificial intelligence (AI) is widely adopted. For such integrated AI-based driving systems, one promising way of building user trust is through the concept of explainable artificial intelligence (XAI) which requires the AI system to provide the user with the explanations behind each decision it makes. Motivated by both the need to enhance user trust and the promise of novel XAI technology in addressing such need, this paper seeks to enhance trustworthiness in autonomous driving systems through the development of explainable Deep Learning (DL) models. First, the paper casts the decision-making process of the AV system not as a classification task (which is the traditional process) but rather as an image-based language generation (image captioning) task. As such, the proposed approach makes driving decisions by first generating textual descriptions of the driving scenarios, which serve as explanations that humans can understand. To this end, a novel multi-modal DL architecture is proposed to jointly model the correlation between an image (driving scenario) and language (descriptions). It adopts a fully Transformer-based structure and therefore has the potential to perform global attention and imitate effectively, the learning processes of human drivers. The results suggest that the proposed model can and does generate legal and meaningful sentences to describe a given driving scenario, and subsequently to correctly generate appropriate driving decisions in autonomous vehicles (AVs). It is also observed that the proposed model significantly outperforms multiple baseline models in terms of generating both explanations and driving actions. From the end user's perspective, the proposed model can be beneficial in enhancing user trust because it provides the rationale behind an AV's actions. From the AV developer's perspective, the explanations from this explainable system could serve as a "debugging" tool to detect potential weaknesses in the existing system and identify specific directions for improvement.
引用
收藏
页数:19
相关论文
共 68 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] Atakishiyev S., 2021, EXPLAINABLE ARTIFICI
  • [3] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
  • [4] Driving behavior explanation with multi-level fusion
    Ben-Younes, Hedi
    Zablocki, Eloi
    Perez, Patrick
    Cord, Matthieu
    [J]. PATTERN RECOGNITION, 2022, 123
  • [5] Bojarski Mariusz, 2016, arXiv
  • [6] Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles
    Chen, Sikai
    Dong, Jiqian
    Ha, Paul
    Li, Yujie
    Labi, Samuel
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2021, 36 (07) : 838 - 857
  • [7] A deep learning algorithm for simulating autonomous driving considering prior knowledge and temporal information
    Chen, Sikai
    Leng, Yue
    Labi, Samuel
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2020, 35 (04) : 305 - 321
  • [8] Learning to Evaluate Image Captioning
    Cui, Yin
    Yang, Guandao
    Veit, Andreas
    Huang, Xun
    Belongie, Serge
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5804 - 5812
  • [9] A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to AI-guided driving policy learning
    Di, Xuan
    Shi, Rongye
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 125 (125)
  • [10] An effective spatial-temporal attention based neural network for traffic flow prediction
    Do, Loan N. N.
    Vu, Hai L.
    Vo, Bao Q.
    Liu, Zhiyuan
    Dinh Phung
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2019, 108 : 12 - 28