Gated Hierarchical Attention for Image Captioning

被引：5

作者：

Wang, Qingzhong ^{[1
]}

Chan, Antoni B. ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

来源：

COMPUTER VISION - ACCV 2018, PT IV | 2019年 / 11364卷

关键词：

Hierarchical attention; Image captioning; Convolutional decoder;

D O I：

10.1007/978-3-030-20870-7_2

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Attention modules connecting encoder and decoders have been widely applied in the field of object recognition, image captioning, visual question answering and neuralmachine translation, and significantly improves the performance. In this paper, we propose a bottom-up gated hierarchical attention (GHA) mechanism for image captioning. Our proposed model employs a CNN as the decoder which is able to learn different concepts at different layers, and apparently, different concepts correspond to different areas of an image. Therefore, we develop the GHA in which low-level concepts are merged into high-level concepts and simultaneously low-level attended features pass to the top to make predictions. Our GHA significantly improves the performance of the model that only applies one level attention, e.g., the CIDEr score increases from 0.923 to 0.999, which is comparable to the state-of-the-art models that employ attributes boosting and reinforcement learning (RL). We also conduct extensive experiments to analyze the CNN decoder and our proposed GHA, and we find that deeper decoders cannot obtain better performance, and when the convolutional decoder becomes deeper the model is likely to collapse during training.

引用

页码：21 / 37

页数：17

共 50 条

[1] Hierarchical Attention Network for Image Captioning
Wang, Weixuan
Chen, Zhihong
Hu, Haifeng
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8957 - 8964
[2] GateCap: Gated spatial and semantic attention model for image captioning
Wang, Shiwei
Lan, Long
Zhang, Xiang
Luo, Zhigang
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (17-18) : 11531 - 11549
[3] GateCap: Gated spatial and semantic attention model for image captioning
Shiwei Wang
Long Lan
Xiang Zhang
Zhigang Luo
Multimedia Tools and Applications, 2020, 79 : 11531 - 11549
[4] Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM
Yang, Yo
Chen, Lizhi
Pan, Longyue
Hu, Juntao
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (10): : 3390 - 3405
[5] Attention on Attention for Image Captioning
Huang, Lun
Wang, Wenmin
Chen, Jie
Wei, Xiao-Yong
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
[6] Image captioning via hierarchical attention mechanism and policy gradient optimization
Yan, Shiyang
Xie, Yuan
Wu, Fangyu
Smith, Jeremy S.
Lu, Wenjin
Zhang, Bailing
SIGNAL PROCESSING, 2020, 167
[7] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
Cheng, Yong
Huang, Fei
Zhou, Lian
Jin, Cheng
Zhang, Yuejie
Zhang, Tao
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
[8] Areas of Attention for Image Captioning
Pedersoli, Marco
Lucas, Thomas
Schmid, Cordelia
Verbeek, Jakob
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1251 - 1259
[9] Image Captioning with Semantic Attention
You, Quanzeng
Jin, Hailin
Wang, Zhaowen
Fang, Chen
Luo, Jiebo
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4651 - 4659
[10] Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning
Guan, Zhibin
Liu, Kang
Ma, Yan
Qian, Xu
Ji, Tongkai
SYMMETRY-BASEL, 2018, 10 (11):

← 1 2 3 4 5 →