Learning Double-Level Relationship Networks for image captioning

被引:8
作者
Wang, Changzhi [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Local-global relationship; Relationship network; Graph attention network; ATTENTION;
D O I
10.1016/j.ipm.2023.103288
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning aims to generate descriptive sentences to describe image main contents. Existing attention-based approaches mainly focus on the salient visual features in the image. However, ignoring the learning relationship between local features and global features may cause local features to lose the interaction with global concepts, generating impropriate or inaccurate relationship words/phrases in the sentences. To alleviate the above issue, in this work we propose the Double-Level Relationship Networks (DLRN) that novelly exploits the complementary local features and global features in the image, and enhances the relationship between features. Technically, DLRN builds two types of networks, separate relationship network and unified relationship embedding network. The former learns different hierarchies of visual relationship by performing graph attention for local-level relationship enhancement and pixel-level relationship enhancement respectively. The latter takes the global features as the guide to learn the local-global relationship between local regions and global concepts, and obtains the feature representation containing rich relationship information. Further, we devise an attention-based feature fusion module to fully utilize the contribution of different modalities. It effectively fuses the previously obtained relationship features and original region features. Extensive experiments on three typical datasets verify that our DLRN significantly outperforms several state-of-the-art baselines. More remarkably, DLRN achieves the competitive performance while maintaining notable model efficiency. The source code is available at the GitHub https://github.com/RunCode90/ImageCaptioning.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] A Comprehensive Survey of Deep Learning for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [32] Facilitated Deep Learning Models for Image Captioning
    Azhar, Imtinan
    Afyouni, Imad
    Elnagar, Ashraf
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [33] Neural Symbolic Representation Learning for Image Captioning
    Wang, Xiaomei
    Ma, Lin
    Fu, Yanwei
    Xue, Xiangyang
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 312 - 321
  • [34] Collaborative Learning Method for Natural Image Captioning
    Wang, Rongzhao
    Liu, Libo
    DATA SCIENCE (ICPCSEE 2022), PT I, 2022, 1628 : 249 - 261
  • [35] Image Captioning With Visual-Semantic Double Attention
    He, Chen
    Hu, Haifeng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [36] Research on Image Captioning Based on Double Attention Model
    Zhuo Y.-Q.
    Wei J.-H.
    Li Z.-X.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (05): : 1123 - 1130
  • [37] High-level Image Classification by Synergizing Image Captioning with BERT
    Yu, Xiaohong
    Ahn, Yoseop
    Jeong, Jaehoon
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1686 - 1690
  • [38] Visual contextual relationship augmented transformer for image captioning
    Su, Qiang
    Hu, Junbo
    Li, Zhixin
    APPLIED INTELLIGENCE, 2024, 54 (06) : 4794 - 4813
  • [39] Visual contextual relationship augmented transformer for image captioning
    Qiang Su
    Junbo Hu
    Zhixin Li
    Applied Intelligence, 2024, 54 : 4794 - 4813
  • [40] Image captioning in Hindi language using transformer networks
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    Singh, Amit Kumar
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92