Adaptive Path Selection for Dynamic Image Captioning

被引：44

作者：

Xian, Tiantao ^{[1
]}

Li, Zhixin ^{[1
]}

Tang, Zhenjun ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Visualization; Feature extraction; Transformers; Semantics; Computational modeling; Adaptation models; Computer architecture; Image captioning; transformer; dynamic routing mechanism; TRANSFORMER;

D O I：

10.1109/TCSVT.2022.3155795

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Image captioning is a challenging task, i.e., given an image machine automatically generates natural language that matches its semantic content and has attracted much attention in recent years. However, most existing models are designed manually, and their performance depends heavily on the expert experience of the designer. In addition, the computational flow of the model is predefined, and hard and easy samples will share the same coding path and easily interfere with each other, thus confusing the learning of the model. In this paper, we propose a Dynamic Transformer to change the encoding procedure from sequential to adaptive, i.e., data-dependent computing paths. Specifically, we design three different types of visual feature extraction blocks and deploy them in parallel at each layer to construct a multi-layer routing space in a fully connected manner. Each block contains a calculation unit that performs the corresponding operations and a routing gate that learns to adaptively select the direction to pass the signal based on the input image. Thus, our model can achieve a robust visual representation by exploring potential visual feature extraction paths. We evaluate our method quantitatively and qualitatively using a benchmark MSCOCO image caption dataset and perform extensive ablation studies to investigate the reasons behind its effectiveness. The experimental results show that our method is significantly superior to previous state-of-the-art methods.

引用

页码：5762 / 5775

页数：14

共 50 条

[1] Image Captioning via Dynamic Path Customization
Ma, Yiwei
Ji, Jiayi
Sun, Xiaoshuai
Zhou, Yiyi
Hong, Xiaopeng
Wu, Yongjian
Ji, Rongrong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[2] Task-Adaptive Attention for Image Captioning
Yan, Chenggang
Hao, Yiming
Li, Liang
Yin, Jian
Liu, Anan
Mao, Zhendong
Chen, Zhenyu
Gao, Xingyu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 43 - 51
[3] Image Captioning With Controllable and Adaptive Length Levels
Ding, Ning
Deng, Chaorui
Tan, Mingkui
Du, Qing
Ge, Zhiwei
Wu, Qi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 764 - 779
[4] Adaptive Syncretic Attention for Constrained Image Captioning
Liang Yang
Haifeng Hu
Neural Processing Letters, 2019, 50 : 549 - 564
[5] Adaptive Syncretic Attention for Constrained Image Captioning
Yang, Liang
Hu, Haifeng
NEURAL PROCESSING LETTERS, 2019, 50 (01) : 549 - 564
[6] Image Captioning Based on Adaptive Balancing Loss
Li, Linghui
Tang, Sheng
Guo, Junbo
Wang, Rui
Lyu, Bo
Tian, Qi
Zhang, Yongdong
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[7] ADAPTIVE HARD EXAMPLE MINING FOR IMAGE CAPTIONING
Wang, Yongzhuang
Shen, Yangmei
Xiong, Hongkai
Lin, Weiyao
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3342 - 3346
[8] Dynamic window sampling strategy for image captioning
Li, Zhixin
Wei, Jiahui
Xian, Tiantao
Zhang, Canlong
Ma, Huifang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 148
[9] Saliency based Subject Selection for Diverse Image Captioning
Quoc-An Luong
Duc Minh Vo
Sugimoto, Akihiro
PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
[10] DAA: Dual LSTMs with adaptive attention for image captioning
Xiao, Fen
Gong, Xue
Zhang, Yiming
Shen, Yanqing
Li, Jun
Gao, Xieping
NEUROCOMPUTING, 2019, 364 : 322 - 329

← 1 2 3 4 5 →