EXplainable AI (XAI) approach to image captioning

被引：10

作者：

Han, Seung-Ho ^{[1
]}

Kwon, Min-Su ^{[1
]}

Choi, Ho-Jin ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea

来源：

JOURNAL OF ENGINEERING-JOE | 2020年 / 2020卷 / 13期

关键词：

learning (artificial intelligence); natural language processing; text analysis; neural nets; image processing; computer vision; XAI; eXplainable AI approach; deep learning techniques; black-box paradigm; explainable image captioning model; absurd caption generation; visual link; MSCOCO dataset; Flickr30K dataset;

D O I：

10.1049/joe.2019.1217

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

This article presents an eXplainable AI (XAI) approach to image captioning. Recently, deep learning techniques have been intensively used to this task with relatively good performance. Due to the 'black-box' paradigm of deep learning, however, existing approaches are unable to provide clues to explain the reasons why specific words have been selected when generating captions for given images, hence leading to generate absurd captions occasionally. To overcome this problem, this article proposes an explainable image captioning model, which provides a visual link between the region of an object (or a concept) in the given image and the particular word (or phrase) in the generated sentence. The model has been evaluated with two datasets, MSCOCO and Flickr30K, and both quantitative and qualitative results are presented to show the effectiveness of the proposed model.

引用

页码：589 / 594

页数：6

共 20 条

[1] [Anonymous], 2014, 14101090 ARXIV
[2] Box George EP, 2011, Bayesian Inference in Statistical Analysis, V40
[3] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Chen, Long
Zhang, Hanwang
Xiao, Jun
Nie, Liqiang
Shao, Jian
Liu, Wei
Chua, Tat-Seng
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
[4] Chen X, 2015, PROC CVPR IEEE, P2422, DOI 10.1109/CVPR.2015.7298856
[5] Denkowski M., 2014, P 9 WORKSH STAT MACH, P376
[6] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[7] Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick, Ross
Donahue, Jeff
Darrell, Trevor
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
[8] Han S, 2018, P INT C COMP INT CSC
[9] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[10] DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Johnson, Justin
Karpathy, Andrej
Fei-Fei, Li
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4565 - 4574

← 1 2 →