Gated Hierarchical Attention for Image Captioning

被引:5
|
作者
Wang, Qingzhong [1 ]
Chan, Antoni B. [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
来源
COMPUTER VISION - ACCV 2018, PT IV | 2019年 / 11364卷
关键词
Hierarchical attention; Image captioning; Convolutional decoder;
D O I
10.1007/978-3-030-20870-7_2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Attention modules connecting encoder and decoders have been widely applied in the field of object recognition, image captioning, visual question answering and neuralmachine translation, and significantly improves the performance. In this paper, we propose a bottom-up gated hierarchical attention (GHA) mechanism for image captioning. Our proposed model employs a CNN as the decoder which is able to learn different concepts at different layers, and apparently, different concepts correspond to different areas of an image. Therefore, we develop the GHA in which low-level concepts are merged into high-level concepts and simultaneously low-level attended features pass to the top to make predictions. Our GHA significantly improves the performance of the model that only applies one level attention, e.g., the CIDEr score increases from 0.923 to 0.999, which is comparable to the state-of-the-art models that employ attributes boosting and reinforcement learning (RL). We also conduct extensive experiments to analyze the CNN decoder and our proposed GHA, and we find that deeper decoders cannot obtain better performance, and when the convolutional decoder becomes deeper the model is likely to collapse during training.
引用
收藏
页码:21 / 37
页数:17
相关论文
共 50 条
  • [1] Hierarchical Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8957 - 8964
  • [2] GateCap: Gated spatial and semantic attention model for image captioning
    Wang, Shiwei
    Lan, Long
    Zhang, Xiang
    Luo, Zhigang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (17-18) : 11531 - 11549
  • [3] GateCap: Gated spatial and semantic attention model for image captioning
    Shiwei Wang
    Long Lan
    Xiang Zhang
    Zhigang Luo
    Multimedia Tools and Applications, 2020, 79 : 11531 - 11549
  • [4] Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM
    Yang, Yo
    Chen, Lizhi
    Pan, Longyue
    Hu, Juntao
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (10): : 3390 - 3405
  • [5] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [6] Image captioning via hierarchical attention mechanism and policy gradient optimization
    Yan, Shiyang
    Xie, Yuan
    Wu, Fangyu
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    SIGNAL PROCESSING, 2020, 167
  • [7] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
    Cheng, Yong
    Huang, Fei
    Zhou, Lian
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
  • [8] Areas of Attention for Image Captioning
    Pedersoli, Marco
    Lucas, Thomas
    Schmid, Cordelia
    Verbeek, Jakob
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1251 - 1259
  • [9] Image Captioning with Semantic Attention
    You, Quanzeng
    Jin, Hailin
    Wang, Zhaowen
    Fang, Chen
    Luo, Jiebo
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4651 - 4659
  • [10] Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning
    Guan, Zhibin
    Liu, Kang
    Ma, Yan
    Qian, Xu
    Ji, Tongkai
    SYMMETRY-BASEL, 2018, 10 (11):