Graph-based image captioning with semantic and spatial features

被引:0
|
作者
Parseh, Mohammad Javad [1 ]
Ghadiri, Saeed [1 ]
机构
[1] Jahrom Univ, Dept Comp Engn & Informat Technol, Jahrom, Iran
关键词
Image captioning; Semantic Graph; Spatial graph; Attention mechanism; TRANSFORMER;
D O I
10.1016/j.image.2025.117273
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Image captioning is a challenging task of image processing that aims to generate descriptive and accurate textual descriptions for images. In this paper, we propose a novel image captioning framework that leverages the power of spatial and semantic relationships between objects in an image, in addition to traditional visual features. Our approach integrates a pre-trained model, RelTR, as a backbone for extracting object bounding boxes and subjectpredicate-object relationship pairs. We use these extracted relationships to construct spatial and semantic graphs, which are processed through separate Graph Convolutional Networks (GCNs) to obtain high-level contextualized features. At the same time, a CNN model is employed to extract visual features from the input image. To merge the feature vectors seamlessly, our approach involves using a multi-modal attention mechanism that is applied separately to the feature maps of the image, the nodes of the semantic graph, and the nodes of the spatial graph during each time step of the LSTM-based decoder. The model concatenates the attended features with the word embedding at the respective time step and fed into the LSTM cell. Our experiments demonstrate the effectiveness of our proposed approach, which competes closely with existing state-of-the-art image captioning techniques by capturing richer contextual information and generating accurate and semantically meaningful captions. (c) 2025 Elsevier Inc. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Sentinel mechanism for visual semantic graph-based image captioning
    Xiao, Fen
    Zhang, Ningru
    Xue, Wenfeng
    Gao, Xieping
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 119
  • [2] Pattern graph-based image retrieval system combining semantic and visual features
    Olfa Allani
    Hajer Baazaoui Zghal
    Nedra Mellouli
    Herman Akdag
    Multimedia Tools and Applications, 2017, 76 : 20287 - 20316
  • [3] Pattern graph-based image retrieval system combining semantic and visual features
    Allani, Olfa
    Zghal, Hajer Baazaoui
    Mellouli, Nedra
    Akdag, Herman
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20287 - 20316
  • [4] Image Captioning with Scene-graph Based Semantic Concepts
    Gao, Lizhao
    Wang, Bo
    Wang, Wenmin
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 225 - 229
  • [5] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
    Li, Shun
    Zhang, Ze-Fan
    Ji, Yi
    Li, Ying
    Liu, Chun-Ping
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [6] Aspect coherence for graph-based semantic image labelling
    Passino, G.
    Patras, I.
    Izquierdo, E.
    IET COMPUTER VISION, 2010, 4 (03) : 183 - 194
  • [7] A SEMANTIC GRAPH-BASED ALGORITHM FOR IMAGE SEARCH RERANKING
    Zhao, Nan
    Dong, Yuan
    Bai, Hongliang
    Wang, Lezi
    Huang, Chong
    Cen, Shusheng
    Zhao, Jian
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 1666 - 1670
  • [8] Aligned visual semantic scene graph for image captioning
    Zhao, Shanshan
    Li, Lixiang
    Peng, Haipeng
    DISPLAYS, 2022, 74
  • [9] Semantic Similarity Measure with Conceptual Graph-Based Image Annotations
    Chinpanthana, Nutchanun
    2012 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT), 2012, : 203 - 208
  • [10] Graph-Based Semantic Segmentation
    Balaska, Vasiliki
    Bampis, Loukas
    Gasteratos, Antonios
    ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2018, 2019, 67 : 572 - 579