Graph-based image captioning with semantic and spatial features

被引：0

作者：

Parseh, Mohammad Javad ^{[1
]}

Ghadiri, Saeed ^{[1
]}

机构：

[1] Jahrom Univ, Dept Comp Engn & Informat Technol, Jahrom, Iran

来源：

SIGNAL PROCESSING-IMAGE COMMUNICATION | 2025年 / 133卷

关键词：

Image captioning; Semantic Graph; Spatial graph; Attention mechanism; TRANSFORMER;

D O I：

10.1016/j.image.2025.117273

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Image captioning is a challenging task of image processing that aims to generate descriptive and accurate textual descriptions for images. In this paper, we propose a novel image captioning framework that leverages the power of spatial and semantic relationships between objects in an image, in addition to traditional visual features. Our approach integrates a pre-trained model, RelTR, as a backbone for extracting object bounding boxes and subjectpredicate-object relationship pairs. We use these extracted relationships to construct spatial and semantic graphs, which are processed through separate Graph Convolutional Networks (GCNs) to obtain high-level contextualized features. At the same time, a CNN model is employed to extract visual features from the input image. To merge the feature vectors seamlessly, our approach involves using a multi-modal attention mechanism that is applied separately to the feature maps of the image, the nodes of the semantic graph, and the nodes of the spatial graph during each time step of the LSTM-based decoder. The model concatenates the attended features with the word embedding at the respective time step and fed into the LSTM cell. Our experiments demonstrate the effectiveness of our proposed approach, which competes closely with existing state-of-the-art image captioning techniques by capturing richer contextual information and generating accurate and semantically meaningful captions. (c) 2025 Elsevier Inc. All rights reserved.

引用

页数：14

共 50 条

[1] Sentinel mechanism for visual semantic graph-based image captioning
Xiao, Fen
Zhang, Ningru
Xue, Wenfeng
Gao, Xieping
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 119
[2] Pattern graph-based image retrieval system combining semantic and visual features
Olfa Allani
Hajer Baazaoui Zghal
Nedra Mellouli
Herman Akdag
Multimedia Tools and Applications, 2017, 76 : 20287 - 20316
[3] Pattern graph-based image retrieval system combining semantic and visual features
Allani, Olfa
Zghal, Hajer Baazaoui
Mellouli, Nedra
Akdag, Herman
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20287 - 20316
[4] Image Captioning with Scene-graph Based Semantic Concepts
Gao, Lizhao
Wang, Bo
Wang, Wenmin
PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 225 - 229
[5] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
Li, Shun
Zhang, Ze-Fan
Ji, Yi
Li, Ying
Liu, Chun-Ping
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[6] Aspect coherence for graph-based semantic image labelling
Passino, G.
Patras, I.
Izquierdo, E.
IET COMPUTER VISION, 2010, 4 (03) : 183 - 194
[7] A SEMANTIC GRAPH-BASED ALGORITHM FOR IMAGE SEARCH RERANKING
Zhao, Nan
Dong, Yuan
Bai, Hongliang
Wang, Lezi
Huang, Chong
Cen, Shusheng
Zhao, Jian
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 1666 - 1670
[8] Aligned visual semantic scene graph for image captioning
Zhao, Shanshan
Li, Lixiang
Peng, Haipeng
DISPLAYS, 2022, 74
[9] Semantic Similarity Measure with Conceptual Graph-Based Image Annotations
Chinpanthana, Nutchanun
2012 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT), 2012, : 203 - 208
[10] Graph-Based Semantic Segmentation
Balaska, Vasiliki
Bampis, Loukas
Gasteratos, Antonios
ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2018, 2019, 67 : 572 - 579

← 1 2 3 4 5 →