Adaptive Path Selection for Dynamic Image Captioning

被引：44

作者：

Xian, Tiantao ^{[1
]}

Li, Zhixin ^{[1
]}

Tang, Zhenjun ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Visualization; Feature extraction; Transformers; Semantics; Computational modeling; Adaptation models; Computer architecture; Image captioning; transformer; dynamic routing mechanism; TRANSFORMER;

D O I：

10.1109/TCSVT.2022.3155795

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Image captioning is a challenging task, i.e., given an image machine automatically generates natural language that matches its semantic content and has attracted much attention in recent years. However, most existing models are designed manually, and their performance depends heavily on the expert experience of the designer. In addition, the computational flow of the model is predefined, and hard and easy samples will share the same coding path and easily interfere with each other, thus confusing the learning of the model. In this paper, we propose a Dynamic Transformer to change the encoding procedure from sequential to adaptive, i.e., data-dependent computing paths. Specifically, we design three different types of visual feature extraction blocks and deploy them in parallel at each layer to construct a multi-layer routing space in a fully connected manner. Each block contains a calculation unit that performs the corresponding operations and a routing gate that learns to adaptively select the direction to pass the signal based on the input image. Thus, our model can achieve a robust visual representation by exploring potential visual feature extraction paths. We evaluate our method quantitatively and qualitatively using a benchmark MSCOCO image caption dataset and perform extensive ablation studies to investigate the reasons behind its effectiveness. The experimental results show that our method is significantly superior to previous state-of-the-art methods.

引用

页码：5762 / 5775

页数：14

共 50 条

[31] Image Captioning with Word Gate and Adaptive Self-Critical Learning
Zhu, Xinxin
Li, Lixiang
Liu, Jing
Guo, Longteng
Fang, Zhiwei
Peng, Haipeng
Niu, Xinxin
APPLIED SCIENCES-BASEL, 2018, 8 (06):
[32] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
Wang, Bo
Zhang, Zhao
Fan, Jicong
Zhao, Mingbo
Zhan, Choujun
Xu, Mingliang
2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
[33] avtmNet:Adaptive Visual-Text Merging Network for Image Captioning
Song, Heng
Zhu, Junwu
Jiang, Yi
COMPUTERS & ELECTRICAL ENGINEERING, 2020, 84
[34] Revolutionizing Image Captioning: Integrating Attention Mechanisms with Adaptive Fusion Gates
Sheng, Shou-Jun
Zhou, Zi-Wei
IAENG International Journal of Computer Science, 2024, 51 (03) : 212 - 221
[35] Context-Adaptive-Based Image Captioning by Bi-CARU
Im, Sio-Kei
Chan, Ka-Hou
IEEE ACCESS, 2023, 11 : 84934 - 84943
[36] Dynamic Sensor Selection for Path Coverage
Shamoun, Simon
Abdelzaher, Tarek F.
Bar-Noy, Amotz
ICDCN '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2019, : 277 - 281
[37] Dynamic-balanced double-attention fusion for image captioning
Wang, Changzhi
Gu, Xiaodong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
[38] Long-tail image captioning with dynamic semantic memory network
Liu, Hao
Yang, Xiaoshan
Xu, Changsheng
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2022, 48 (08): : 1399 - 1408
[39] Dynamic-balanced double-attention fusion for image captioning
Wang, Changzhi
Gu, Xiaodong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
[40] Coastal Image Captioning
Yang, Qiaoqiao
Wang, Guangxing
Zhang, Xiaoyu
Grecos, Christos
Ren, Peng
JOURNAL OF COASTAL RESEARCH, 2020, : 145 - 150

← 1 2 3 4 5 →