A Hierarchical Multimodal Attention-based Neural Network for Image Captioning

被引:16
|
作者
Cheng, Yong [1 ]
Huang, Fei [1 ]
Zhou, Lian [1 ]
Jin, Cheng [1 ]
Zhang, Yuejie [1 ]
Zhang, Tao [2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Univ Finance & Econ, Sch Informat Management & Engn, Shanghai, Peoples R China
关键词
Image Captioning; Multimodal Attention; Hierarchical Recurrent Neural Network; Long-Short Term Memory Model;
D O I
10.1145/3077136.3080671
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel hierarchical multimodal attention-based model is developed in this paper to generate more accurate and descriptive captions for images. Our model is an "end-to-end" neural network which contains three related sub-networks: a deep convolutional neural network to encode image contents, a recurrent neural network to identify the objects in images sequentially, and a multimodal attention-based recurrent neural network to generate image captions. The main contribution of our work is that the hierarchical structure and multimodal attention mechanism is both applied, thus each caption word can be generated with the multimodal attention on the intermediate semantic objects and the global visual content. Our experiments on two benchmark datasets have obtained very positive results.
引用
收藏
页码:889 / 892
页数:4
相关论文
共 50 条
  • [11] Hierarchical Deep Neural Network for Image Captioning
    Su, Yuting
    Li, Yuqian
    Xu, Ning
    Liu, An-An
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1057 - 1067
  • [12] Hierarchical Deep Neural Network for Image Captioning
    Yuting Su
    Yuqian Li
    Ning Xu
    An-An Liu
    Neural Processing Letters, 2020, 52 : 1057 - 1067
  • [13] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [14] Attention-based neural network for polarimetric image denoising
    Liu, Hedong
    Zhang, Yizhu
    Cheng, Zhenzhou
    Zhai, Jingsheng
    Hu, Haofeng
    OPTICS LETTERS, 2022, 47 (11) : 2726 - 2729
  • [15] Attention-based multimodal image matching
    Moreshet, Aviad
    Keller, Yosi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [16] Attention-Based Hierarchical Recurrent Neural Network for Phenotype Classification
    Xu, Nan
    Shen, Yanyan
    Zhu, Yanmin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 465 - 476
  • [17] A Visual Attention-Based Model for Bengali Image Captioning
    Das B.
    Pal R.
    Majumder M.
    Phadikar S.
    Sekh A.A.
    SN Computer Science, 4 (2)
  • [18] Attention-Based Image Captioning Using DenseNet Features
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 109 - 117
  • [19] Attention-Based Multimodal Neural Network for Automatic Evaluation of Press Conferences
    Yi, Shengzhou
    Mochitomi, Koshiro
    Suzuki, Isao
    Wang, Xueting
    Yamasaki, Toshihiko
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2020, 11 (03): : 1 - 19
  • [20] AMNN: Attention-Based Multimodal Neural Network Model for Hashtag Recommendation
    Yang, Qi
    Wu, Gaosheng
    Li, Yuhua
    Li, Ruixuan
    Gu, Xiwu
    Deng, Huicai
    Wu, Junzhuang
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (03) : 768 - 779