Learning Double-Level Relationship Networks for image captioning

被引:8
|
作者
Wang, Changzhi [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Local-global relationship; Relationship network; Graph attention network; ATTENTION;
D O I
10.1016/j.ipm.2023.103288
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning aims to generate descriptive sentences to describe image main contents. Existing attention-based approaches mainly focus on the salient visual features in the image. However, ignoring the learning relationship between local features and global features may cause local features to lose the interaction with global concepts, generating impropriate or inaccurate relationship words/phrases in the sentences. To alleviate the above issue, in this work we propose the Double-Level Relationship Networks (DLRN) that novelly exploits the complementary local features and global features in the image, and enhances the relationship between features. Technically, DLRN builds two types of networks, separate relationship network and unified relationship embedding network. The former learns different hierarchies of visual relationship by performing graph attention for local-level relationship enhancement and pixel-level relationship enhancement respectively. The latter takes the global features as the guide to learn the local-global relationship between local regions and global concepts, and obtains the feature representation containing rich relationship information. Further, we devise an attention-based feature fusion module to fully utilize the contribution of different modalities. It effectively fuses the previously obtained relationship features and original region features. Extensive experiments on three typical datasets verify that our DLRN significantly outperforms several state-of-the-art baselines. More remarkably, DLRN achieves the competitive performance while maintaining notable model efficiency. The source code is available at the GitHub https://github.com/RunCode90/ImageCaptioning.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Automatic image captioning system using a deep learning approach
    Deepak, Gerard
    Gali, Sowmya
    Sonker, Abhilash
    Jos, Bobin Cherian
    Sagar, K. V. Daya
    Singh, Charanjeet
    SOFT COMPUTING, 2023,
  • [22] A Research on Image Captioning by Different Encoder Networks
    Chang, Jieh-Ren
    Ling, Tsung-Ta
    Li, Ting-Chun
    2020 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2020), 2021, : 68 - 71
  • [23] Contextual and selective attention networks for image captioning
    Wang, Jing
    Li, Yehao
    Pan, Yingwei
    Yao, Ting
    Tang, Jinhui
    Mei, Tao
    SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (12)
  • [24] Temporal Convolutional and Recurrent Networks for Image Captioning
    Iskra, Natalia
    Iskra, Vitaly
    PATTERN RECOGNITION AND INFORMATION PROCESSING, PRIP 2019, 2019, 1055 : 254 - 266
  • [25] Survey of convolutional neural networks for image captioning
    Kalra, Saloni
    Leekha, Alka
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (01): : 239 - 260
  • [26] Contextual and selective attention networks for image captioning
    Jing Wang
    Yehao Li
    Yingwei Pan
    Ting Yao
    Jinhui Tang
    Tao Mei
    Science China Information Sciences, 2022, 65
  • [27] Multiple-Level Feature-Based Network for Image Captioning
    Zheng, Kaidi
    Zhu, Chen
    Lu, Shaopeng
    Liu, Yonggang
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 94 - 103
  • [28] Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning
    Hafeth, Deema Abdal
    Kollias, Stefanos
    SENSORS, 2024, 24 (06)
  • [29] Learning Image Captioning as a Structured Transduction Task
    Bacciu, Davide
    Serramazza, Davide
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 235 - 246
  • [30] Deep Learning Approaches on Image Captioning: A Review
    Ghandi, Taraneh
    Pourreza, Hamidreza
    Mahyar, Hamidreza
    ACM COMPUTING SURVEYS, 2024, 56 (03)