Joint Scence Network and Attention-Guided for Image Captioning

被引:2
|
作者
Zhou, Dongming [1 ]
Yang, Jing [1 ]
Zhang, Canlong [1 ]
Tang, Yanping [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541000, Peoples R China
[2] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Attention Network; Graph Convolutional Network; Machine Learning;
D O I
10.1109/ICDM51629.2021.00201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning is an interesting and challenging task. The previously established image captioning approach is based mainly on the encoder-decoder architecture, but it suffers from problems such as inaccurate captioning information, and the generated captioning sentences are not sufficiently rich. This paper proposes a novel image captioning model that is based on a self-attention network and a scene graph relationship network. First, an improved self-attention network is added to the extraction of visual features to evaluate the effectiveness of image global information for image generation. Then, we design a visual intensity parameter to coordinate the strategies of visual features and language model for word generation. Finally, a graph convolutional network is designed to extract the relationships from the scene information to render the generated caption more exciting and to increase the accuracy of the fine-grained captioning . We demonstrated the satisfactory performance of the model on the MS-COCO and Flickr 30K datasets. The experimental results demonstrate that the proposed model realizes state-of-the-art performance.
引用
收藏
页码:1535 / 1540
页数:6
相关论文
共 50 条
  • [21] MGTANet: Multi-Scale Guided Token Attention Network for Image Captioning
    Jia, Wenhao
    Wang, Ronggui
    Yang, Juan
    Xua, Lixia
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 237 - 245
  • [22] ATTENTION-GUIDED COST VOLUME REFINEMENT NETWORK FOR SATELLITE STEREO IMAGE MATCHING
    Jeong, W. J.
    Park, S. Y.
    GEOSPATIAL WEEK 2023, VOL. 48-1, 2023, : 1045 - 1050
  • [23] Dual Attention-Guided Detail and Structure Information Fusion Network for Image Dehazing
    Gao J.-R.
    Li H.-F.
    Zhang Y.-F.
    Xie M.-H.
    Li F.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (01): : 160 - 171
  • [24] Deep Attention-Guided Spatial-Spectral Network for Hyperspectral Image Unmixing
    Qi, Lin
    Yue, Mengyi
    Gao, Feng
    Cao, Bing
    Dong, Junyu
    Gao, Xinbo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [25] Attention-guided Unified Network for Panoptic Segmentation
    Li, Yanwei
    Chen, Xinze
    Zhu, Zheng
    Xie, Lingxi
    Huang, Guan
    Du, Dalong
    Wang, Xingang
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7019 - 7028
  • [26] Multiscale Attention-Guided Panoptic Segmentation Network
    Fu, Du
    Qu, Shaojun
    Fu, Ya
    Computer Engineering and Applications, 2023, 59 (22) : 223 - 232
  • [27] Attention-guided feature fusion and joint learning for remote sensing image scene classification
    Yu D.
    Xu Q.
    Zhao C.
    Guo H.
    Lu J.
    Lin Y.
    Liu X.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2023, 52 (04): : 624 - 637
  • [28] Attention-Guided Network for Semantic Video Segmentation
    Li, Jiangyun
    Zhao, Yikai
    Fu, Jun
    Wu, Jiajia
    Liu, Jing
    IEEE ACCESS, 2019, 7 : 140680 - 140689
  • [29] Attention-guided aggregation stereo matching network
    Zhang, Yaru
    Li, Yaqian
    Wu, Chao
    Liu, Bin
    IMAGE AND VISION COMPUTING, 2021, 106
  • [30] Attention-guided network with hierarchical global priors for low-light image enhancement
    An Gong
    Zhonghao Li
    Heng Wang
    Guangtong Li
    Signal, Image and Video Processing, 2023, 17 : 2083 - 2091