Adaptive Path Selection for Dynamic Image Captioning

被引:44
|
作者
Xian, Tiantao [1 ]
Li, Zhixin [1 ]
Tang, Zhenjun [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Feature extraction; Transformers; Semantics; Computational modeling; Adaptation models; Computer architecture; Image captioning; transformer; dynamic routing mechanism; TRANSFORMER;
D O I
10.1109/TCSVT.2022.3155795
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Image captioning is a challenging task, i.e., given an image machine automatically generates natural language that matches its semantic content and has attracted much attention in recent years. However, most existing models are designed manually, and their performance depends heavily on the expert experience of the designer. In addition, the computational flow of the model is predefined, and hard and easy samples will share the same coding path and easily interfere with each other, thus confusing the learning of the model. In this paper, we propose a Dynamic Transformer to change the encoding procedure from sequential to adaptive, i.e., data-dependent computing paths. Specifically, we design three different types of visual feature extraction blocks and deploy them in parallel at each layer to construct a multi-layer routing space in a fully connected manner. Each block contains a calculation unit that performs the corresponding operations and a routing gate that learns to adaptively select the direction to pass the signal based on the input image. Thus, our model can achieve a robust visual representation by exploring potential visual feature extraction paths. We evaluate our method quantitatively and qualitatively using a benchmark MSCOCO image caption dataset and perform extensive ablation studies to investigate the reasons behind its effectiveness. The experimental results show that our method is significantly superior to previous state-of-the-art methods.
引用
收藏
页码:5762 / 5775
页数:14
相关论文
共 50 条
  • [31] Image Captioning with Word Gate and Adaptive Self-Critical Learning
    Zhu, Xinxin
    Li, Lixiang
    Liu, Jing
    Guo, Longteng
    Fang, Zhiwei
    Peng, Haipeng
    Niu, Xinxin
    APPLIED SCIENCES-BASEL, 2018, 8 (06):
  • [32] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
    Wang, Bo
    Zhang, Zhao
    Fan, Jicong
    Zhao, Mingbo
    Zhan, Choujun
    Xu, Mingliang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
  • [33] avtmNet:Adaptive Visual-Text Merging Network for Image Captioning
    Song, Heng
    Zhu, Junwu
    Jiang, Yi
    COMPUTERS & ELECTRICAL ENGINEERING, 2020, 84
  • [34] Revolutionizing Image Captioning: Integrating Attention Mechanisms with Adaptive Fusion Gates
    Sheng, Shou-Jun
    Zhou, Zi-Wei
    IAENG International Journal of Computer Science, 2024, 51 (03) : 212 - 221
  • [35] Context-Adaptive-Based Image Captioning by Bi-CARU
    Im, Sio-Kei
    Chan, Ka-Hou
    IEEE ACCESS, 2023, 11 : 84934 - 84943
  • [36] Dynamic Sensor Selection for Path Coverage
    Shamoun, Simon
    Abdelzaher, Tarek F.
    Bar-Noy, Amotz
    ICDCN '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2019, : 277 - 281
  • [37] Dynamic-balanced double-attention fusion for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
  • [38] Long-tail image captioning with dynamic semantic memory network
    Liu, Hao
    Yang, Xiaoshan
    Xu, Changsheng
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2022, 48 (08): : 1399 - 1408
  • [39] Dynamic-balanced double-attention fusion for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
  • [40] Coastal Image Captioning
    Yang, Qiaoqiao
    Wang, Guangxing
    Zhang, Xiaoyu
    Grecos, Christos
    Ren, Peng
    JOURNAL OF COASTAL RESEARCH, 2020, : 145 - 150