EXplainable AI (XAI) approach to image captioning

被引:10
作者
Han, Seung-Ho [1 ]
Kwon, Min-Su [1 ]
Choi, Ho-Jin [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea
来源
JOURNAL OF ENGINEERING-JOE | 2020年 / 2020卷 / 13期
关键词
learning (artificial intelligence); natural language processing; text analysis; neural nets; image processing; computer vision; XAI; eXplainable AI approach; deep learning techniques; black-box paradigm; explainable image captioning model; absurd caption generation; visual link; MSCOCO dataset; Flickr30K dataset;
D O I
10.1049/joe.2019.1217
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This article presents an eXplainable AI (XAI) approach to image captioning. Recently, deep learning techniques have been intensively used to this task with relatively good performance. Due to the 'black-box' paradigm of deep learning, however, existing approaches are unable to provide clues to explain the reasons why specific words have been selected when generating captions for given images, hence leading to generate absurd captions occasionally. To overcome this problem, this article proposes an explainable image captioning model, which provides a visual link between the region of an object (or a concept) in the given image and the particular word (or phrase) in the generated sentence. The model has been evaluated with two datasets, MSCOCO and Flickr30K, and both quantitative and qualitative results are presented to show the effectiveness of the proposed model.
引用
收藏
页码:589 / 594
页数:6
相关论文
共 20 条
  • [1] [Anonymous], 2014, 14101090 ARXIV
  • [2] Box George EP, 2011, Bayesian Inference in Statistical Analysis, V40
  • [3] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
    Chen, Long
    Zhang, Hanwang
    Xiao, Jun
    Nie, Liqiang
    Shao, Jian
    Liu, Wei
    Chua, Tat-Seng
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
  • [4] Chen X, 2015, PROC CVPR IEEE, P2422, DOI 10.1109/CVPR.2015.7298856
  • [5] Denkowski M., 2014, P 9 WORKSH STAT MACH, P376
  • [6] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
  • [7] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
  • [8] Han S, 2018, P INT C COMP INT CSC
  • [9] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
  • [10] DenseCap: Fully Convolutional Localization Networks for Dense Captioning
    Johnson, Justin
    Karpathy, Andrej
    Fei-Fei, Li
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4565 - 4574