Automated Image Captioning with Multi-layer Gated Recurrent Unit

被引:0
|
作者
Moral, Ozge Taylan [1 ]
Kilic, Volkan [1 ]
Onan, Aytug [2 ]
Wang, Wenwu [3 ]
机构
[1] Izmir Katip Celebi Univ, Elect & Elect Engn Grad Program, Izmir, Turkey
[2] Izmir Katip Celebi Univ, Dept Comp Engn, Izmir, Turkey
[3] Univ Surrey, Ctr Vis Speech & Signal Proc CVSSP, Guildford, Surrey, England
来源
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022) | 2022年
关键词
convolutional neural network; gated recurrent unit; image captioning; recurrent neural network;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Describing the semantic content of an image via natural language, known as image captioning, has recently attracted substantial interest in computer vision and language processing communities. Current image captioning approaches are mainly based on an encoder-decoder framework in which visual information is extracted by an image encoder and captions are generated by a text decoder, using convolution neural networks (CNN) and recurrent neural networks (RNN), respectively. Although this framework is promising for image captioning, it has limitations in utilizing the encoded visual information for generating grammatically and semantically correct captions in the RNN decoder. More specifically, the RNN decoder is ineffective in using the contextual information from the encoded data due to its limited ability in capturing long-term complex dependencies. Inspired by the advantage of gated recurrent unit (GRU), in this paper, we propose an extension of conventional RNN by introducing a multi-layer GRU that modulates the most relevant information inside the unit to enhance the semantic coherence of captions. Experimental results on the MSCOCO dataset show the superiority of our proposed approach over the state-of-the-art approaches in several performance metrics.
引用
收藏
页码:1160 / 1164
页数:5
相关论文
共 50 条
  • [1] Price Prediction of Cryptocurrency Using a Multi-Layer Gated Recurrent Unit Network with Multi Features
    Gyana Ranjan Patra
    Mihir Narayan Mohanty
    Computational Economics, 2023, 62 : 1525 - 1544
  • [2] Price Prediction of Cryptocurrency Using a Multi-Layer Gated Recurrent Unit Network with Multi Features
    Patra, Gyana Ranjan
    Mohanty, Mihir Narayan
    COMPUTATIONAL ECONOMICS, 2023, 62 (04) : 1525 - 1544
  • [3] A Combined Gated Recurrent Unit and Multi-Layer Perception Neural Network Model for Predicting Shale Gas Production
    Qin, Xiaozhou
    Hu, Xiaohu
    Liu, Hua
    Shi, Weiyi
    Cui, Jiashuo
    PROCESSES, 2023, 11 (03)
  • [4] Multi-feature fusion enhanced transformer with multi-layer fused decoding for image captioning
    Zhang, Jing
    Fang, Zhongjun
    Wang, Zhe
    APPLIED INTELLIGENCE, 2023, 53 (11) : 13398 - 13414
  • [5] Multi-feature fusion enhanced transformer with multi-layer fused decoding for image captioning
    Jing Zhang
    Zhongjun Fang
    Zhe Wang
    Applied Intelligence, 2023, 53 : 13398 - 13414
  • [6] Crude Oil Prices Forecasting: An Approach of Using CEEMDAN-Based Multi-Layer Gated Recurrent Unit Networks
    Lin, Hualing
    Sun, Qiubi
    ENERGIES, 2020, 13 (07)
  • [7] A multi-layer memory sharing network for video captioning
    Niu, Tian-Zi
    Dong, Shan -Shan
    Chen, Zhen-Duo
    Luo, Xin
    Huang, Zi
    Guo, Shanqing
    Xu, Xin-Shun
    PATTERN RECOGNITION, 2023, 136
  • [8] CASCADE ATTENTION FUSION FOR FINE-GRAINED IMAGE CAPTIONING BASED ON MULTI-LAYER LSTM
    Wang, Shuang
    Meng, Yun
    Gu, Yu
    Zhang, Lei
    Ye, Xiutiao
    Tian, Jingxian
    Jiao, Licheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2245 - 2249
  • [9] Self-attention and asymmetric multi-layer perceptron-gated recurrent unit blocks for protein secondary structure prediction
    Ismi, Dewi Pramudi
    Pulungan, Reza
    Afiahayati
    APPLIED SOFT COMPUTING, 2024, 159
  • [10] Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM
    Yang, Yo
    Chen, Lizhi
    Pan, Longyue
    Hu, Juntao
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (10): : 3390 - 3405