Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking

被引：0

作者：

Rajarshi Biswas

Michael Barz

Daniel Sonntag

机构：

[1] German Research Center for Artificial Intelligence (DFKI),

[2] German Research Center for Artificial Intelligence (DFKI),undefined

[3] Saarbrücken Graduate School of Computer Science,undefined

来源：

KI - Künstliche Intelligenz | 2020年 / 34卷

关键词：

Image captioning; Deep learning; Explainable artificial intelligence (XAI); Visual explanations; Interactive machine learning (IML); Beam search; Re-ranking;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.

引用

页码：571 / 584

页数：13

共 14 条

[1] Gunning D(2019)DARPA’s explainable artificial intelligence (XAI) program AI Mag 40 44-58
[2] Aha D(2015)Spatial transformer networks Adv Neural Inf Process Syst 28 2017-2025
[3] Jaderberg M(2015)Faster R-CNN: towards real-time object detection with region proposal networks Adv Neural Inf Process Syst 28 91-99
[4] Simonyan K(2015)Coactive learning J Artif Intell Res 53 1-40
[5] Zisserman A(2014)Edge boxes: Locating object proposals from edges Comput Vis ECCV 2014 391-405
[6] Kavukcuoglu K(undefined)undefined undefined undefined undefined-undefined
[7] Ren S(undefined)undefined undefined undefined undefined-undefined
[8] He K(undefined)undefined undefined undefined undefined-undefined
[9] Girshick R(undefined)undefined undefined undefined undefined-undefined
[10] Sun J(undefined)undefined undefined undefined undefined-undefined

← 1 2 →