Learning Double-Level Relationship Networks for image captioning

被引:8
作者
Wang, Changzhi [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Local-global relationship; Relationship network; Graph attention network; ATTENTION;
D O I
10.1016/j.ipm.2023.103288
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning aims to generate descriptive sentences to describe image main contents. Existing attention-based approaches mainly focus on the salient visual features in the image. However, ignoring the learning relationship between local features and global features may cause local features to lose the interaction with global concepts, generating impropriate or inaccurate relationship words/phrases in the sentences. To alleviate the above issue, in this work we propose the Double-Level Relationship Networks (DLRN) that novelly exploits the complementary local features and global features in the image, and enhances the relationship between features. Technically, DLRN builds two types of networks, separate relationship network and unified relationship embedding network. The former learns different hierarchies of visual relationship by performing graph attention for local-level relationship enhancement and pixel-level relationship enhancement respectively. The latter takes the global features as the guide to learn the local-global relationship between local regions and global concepts, and obtains the feature representation containing rich relationship information. Further, we devise an attention-based feature fusion module to fully utilize the contribution of different modalities. It effectively fuses the previously obtained relationship features and original region features. Extensive experiments on three typical datasets verify that our DLRN significantly outperforms several state-of-the-art baselines. More remarkably, DLRN achieves the competitive performance while maintaining notable model efficiency. The source code is available at the GitHub https://github.com/RunCode90/ImageCaptioning.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Introducing Concept And Syntax Transition Networks for Image Captioning
    Blandfort, Philipp
    Karayil, Tushar
    Borth, Damian
    Dengel, Andreas
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 385 - 388
  • [42] AFSDCGN: Adaptive Feature Scaling and Dynamic Contextual Graph Networks for image captioning with unseen relationship detection
    Thakare Y.A.
    Walse K.H.
    Atique M.
    Multimedia Tools and Applications, 2025, 84 (11) : 8767 - 8801
  • [43] Learning Combinatorial Prompts for Universal Controllable Image Captioning
    Wang, Zhen
    Xiao, Jun
    Zhuang, Yueting
    Gao, Fei
    Shao, Jian
    Chen, Long
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 129 - 150
  • [44] Reinforcement Learning Transformer for Image Captioning Generation Model
    Yan, Zhaojie
    FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
  • [45] High-Order Interaction Learning for Image Captioning
    Wang, Yanhui
    Xu, Ning
    Liu, An-An
    Li, Wenhui
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4417 - 4430
  • [46] Dual Learning for Cross-domain Image Captioning
    Zhao, Wei
    Xu, Wei
    Yang, Min
    Ye, Jianbo
    Zhao, Zhou
    Feng, Yabing
    Qiao, Yu
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 29 - 38
  • [47] Generative image captioning in Urdu using deep learning
    Afzal M.K.
    Shardlow M.
    Tuarob S.
    Zaman F.
    Sarwar R.
    Ali M.
    Aljohani N.R.
    Lytras M.D.
    Nawaz R.
    Hassan S.-U.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 7719 - 7731
  • [48] Multitask Learning for Cross-Domain Image Captioning
    Yang, Min
    Zhao, Wei
    Xu, Wei
    Feng, Yabing
    Zhao, Zhou
    Chen, Xiaojun
    Lei, Kai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1047 - 1061
  • [49] Metamorphic Testing of Image Captioning Systems via Image-Level Reduction
    Xie, Xiaoyuan
    Li, Xingpeng
    Chen, Songqiang
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (11) : 2962 - 2982
  • [50] Structural Semantic Adversarial Active Learning for Image Captioning
    Zhang, Beichen
    Li, Liang
    Su, Li
    Wang, Shuhui
    Deng, Jincan
    Zha, Zheng-Jun
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121