Adaptive Path Selection for Dynamic Image Captioning

被引:44
|
作者
Xian, Tiantao [1 ]
Li, Zhixin [1 ]
Tang, Zhenjun [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Feature extraction; Transformers; Semantics; Computational modeling; Adaptation models; Computer architecture; Image captioning; transformer; dynamic routing mechanism; TRANSFORMER;
D O I
10.1109/TCSVT.2022.3155795
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Image captioning is a challenging task, i.e., given an image machine automatically generates natural language that matches its semantic content and has attracted much attention in recent years. However, most existing models are designed manually, and their performance depends heavily on the expert experience of the designer. In addition, the computational flow of the model is predefined, and hard and easy samples will share the same coding path and easily interfere with each other, thus confusing the learning of the model. In this paper, we propose a Dynamic Transformer to change the encoding procedure from sequential to adaptive, i.e., data-dependent computing paths. Specifically, we design three different types of visual feature extraction blocks and deploy them in parallel at each layer to construct a multi-layer routing space in a fully connected manner. Each block contains a calculation unit that performs the corresponding operations and a routing gate that learns to adaptively select the direction to pass the signal based on the input image. Thus, our model can achieve a robust visual representation by exploring potential visual feature extraction paths. We evaluate our method quantitatively and qualitatively using a benchmark MSCOCO image caption dataset and perform extensive ablation studies to investigate the reasons behind its effectiveness. The experimental results show that our method is significantly superior to previous state-of-the-art methods.
引用
收藏
页码:5762 / 5775
页数:14
相关论文
共 50 条
  • [1] Image Captioning via Dynamic Path Customization
    Ma, Yiwei
    Ji, Jiayi
    Sun, Xiaoshuai
    Zhou, Yiyi
    Hong, Xiaopeng
    Wu, Yongjian
    Ji, Rongrong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [2] Task-Adaptive Attention for Image Captioning
    Yan, Chenggang
    Hao, Yiming
    Li, Liang
    Yin, Jian
    Liu, Anan
    Mao, Zhendong
    Chen, Zhenyu
    Gao, Xingyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 43 - 51
  • [3] Image Captioning With Controllable and Adaptive Length Levels
    Ding, Ning
    Deng, Chaorui
    Tan, Mingkui
    Du, Qing
    Ge, Zhiwei
    Wu, Qi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 764 - 779
  • [4] Adaptive Syncretic Attention for Constrained Image Captioning
    Liang Yang
    Haifeng Hu
    Neural Processing Letters, 2019, 50 : 549 - 564
  • [5] Adaptive Syncretic Attention for Constrained Image Captioning
    Yang, Liang
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2019, 50 (01) : 549 - 564
  • [6] Image Captioning Based on Adaptive Balancing Loss
    Li, Linghui
    Tang, Sheng
    Guo, Junbo
    Wang, Rui
    Lyu, Bo
    Tian, Qi
    Zhang, Yongdong
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [7] ADAPTIVE HARD EXAMPLE MINING FOR IMAGE CAPTIONING
    Wang, Yongzhuang
    Shen, Yangmei
    Xiong, Hongkai
    Lin, Weiyao
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3342 - 3346
  • [8] Dynamic window sampling strategy for image captioning
    Li, Zhixin
    Wei, Jiahui
    Xian, Tiantao
    Zhang, Canlong
    Ma, Huifang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 148
  • [9] Saliency based Subject Selection for Diverse Image Captioning
    Quoc-An Luong
    Duc Minh Vo
    Sugimoto, Akihiro
    PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
  • [10] DAA: Dual LSTMs with adaptive attention for image captioning
    Xiao, Fen
    Gong, Xue
    Zhang, Yiming
    Shen, Yanqing
    Li, Jun
    Gao, Xieping
    NEUROCOMPUTING, 2019, 364 : 322 - 329