Adaptive Path Selection for Dynamic Image Captioning

被引：44

作者：

Xian, Tiantao ^{[1
]}

Li, Zhixin ^{[1
]}

Tang, Zhenjun ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Visualization; Feature extraction; Transformers; Semantics; Computational modeling; Adaptation models; Computer architecture; Image captioning; transformer; dynamic routing mechanism; TRANSFORMER;

D O I：

10.1109/TCSVT.2022.3155795

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Image captioning is a challenging task, i.e., given an image machine automatically generates natural language that matches its semantic content and has attracted much attention in recent years. However, most existing models are designed manually, and their performance depends heavily on the expert experience of the designer. In addition, the computational flow of the model is predefined, and hard and easy samples will share the same coding path and easily interfere with each other, thus confusing the learning of the model. In this paper, we propose a Dynamic Transformer to change the encoding procedure from sequential to adaptive, i.e., data-dependent computing paths. Specifically, we design three different types of visual feature extraction blocks and deploy them in parallel at each layer to construct a multi-layer routing space in a fully connected manner. Each block contains a calculation unit that performs the corresponding operations and a routing gate that learns to adaptively select the direction to pass the signal based on the input image. Thus, our model can achieve a robust visual representation by exploring potential visual feature extraction paths. We evaluate our method quantitatively and qualitatively using a benchmark MSCOCO image caption dataset and perform extensive ablation studies to investigate the reasons behind its effectiveness. The experimental results show that our method is significantly superior to previous state-of-the-art methods.

引用

页码：5762 / 5775

页数：14

共 50 条

[21] EdgeScan for IoT Contextual Understanding With Edge Computing and Image Captioning
Hafeth, Deema Abdal
Al-Khafajiy, Mohammed
Kollias, Stefanos
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (06): : 6519 - 6535
[22] Context-Adaptive-Based Image Captioning by Bi-CARU
Im, Sio-Kei
Chan, Ka-Hou
IEEE ACCESS, 2023, 11 : 84934 - 84943
[23] Image Captioning With Positional and Geometrical Semantics
Ul Haque, Anwar
Ghani, Sayeed
Saeed, Muhammad
IEEE ACCESS, 2021, 9 : 160917 - 160925
[24] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
Meng, Lingwu
Wang, Jing
Yang, Yang
Xiao, Liang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
[25] An Image Captioning Model Based on Bidirectional Depth Residuals and its Application
Zhou, Ziwei
Xu, Liang
Wang, Chaoyang
Xie, Wei
Wang, Shuo
Ge, Shaoqiang
Zhang, Ye
IEEE ACCESS, 2021, 9 : 25360 - 25370
[26] Multimodal Transformer With Multi-View Visual Representation for Image Captioning
Yu, Jun
Li, Jing
Yu, Zhou
Huang, Qingming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4467 - 4480
[27] De-Confounding Feature Fusion Transformer Network for Image Captioning in Assistive Navigation Applications for the Visually Impaired
Cao, Zhengcai
Xia, Ji
Zhou, MengChu
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
[28] Switching Text-Based Image Encoders for Captioning Images With Text
Ueda, Arisa
Yang, Wei
Sugiura, Komei
IEEE ACCESS, 2023, 11 : 55706 - 55715
[29] Embedded Heterogeneous Attention Transformer for Cross-Lingual Image Captioning
Song, Zijie
Hu, Zhenzhen
Zhou, Yuanen
Zhao, Ye
Hong, Richang
Wang, Meng
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9008 - 9020
[30] Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Gao, Lianli
Li, Xiangpeng
Song, Jingkuan
Shen, Heng Tao
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1112 - 1131

← 1 2 3 4 5 →