SACIC: A Semantics-Aware Convolutional Image Captioner Using Multi-level Pervasive Attention

被引:1
作者
Parameswaran, Sandeep Narayan [1 ]
Das, Sukhendu [1 ]
机构
[1] Indian Inst Technol Madras, Dept Comp Sci & Engn, Visualizat & Percept Lab, Chennai, Tamil Nadu, India
来源
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III | 2019年 / 11955卷
关键词
Image captioning; Convolutional neural networks; Deep learning; Computer vision;
D O I
10.1007/978-3-030-36718-3_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention mechanisms alongside encoder-decoder architectures have become integral components for solving the image captioning problem. The attention mechanism recombines an encoding of the image depending on the state of the decoder, to generate the caption sequence. The decoder is predominantly recurrent in nature. In contrast, we propose a novel network possessing attention-like properties that are pervasive through its layers, by utilizing a convolutional neural network (CNN) to refine and combine representations at multiple levels of the architecture for captioning images. We also enable the model to use explicit higher-level semantic information obtained by performing panoptic segmentation on the image. The attention capability of the model is visually demonstrated, and an experimental evaluation is shown on the MS-COCO dataset. We exhibit that the approach is more robust, efficient, and yields better performance in comparison to the state-of-the-art architectures for image captioning.
引用
收藏
页码:64 / 76
页数:13
相关论文
共 50 条
[31]   Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging [J].
Lee, Jongpil ;
Nam, Juhan .
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) :1208-1212
[32]   A Hybrid Multi-level image denosingapproach using segmentation and CNN framrework on different imaging systems [J].
Kumar, K. Kiran ;
Rajasekar, B. .
INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (04) :291-301
[33]   Hyperspectral image classification using multi-level features fusion capsule network with a dense structure [J].
Jiansi Ren ;
Meilin Shi ;
Jiannan Chen ;
Ruoxiang Wang ;
Xin Wang .
Applied Intelligence, 2023, 53 :14162-14181
[34]   Hyperspectral image classification using multi-level features fusion capsule network with a dense structure [J].
Ren, Jiansi ;
Shi, Meilin ;
Chen, Jiannan ;
Wang, Ruoxiang ;
Wang, Xin .
APPLIED INTELLIGENCE, 2023, 53 (11) :14162-14181
[35]   AMEF-Net: Towards an attention and multi-level enhancement fusion for medical image classification in Parkinson's aided diagnosis [J].
Ding, Qingyan ;
Pan, Yu ;
Liu, Jianxin ;
Li, Lianxin ;
Liu, Nan ;
Li, Na ;
Zheng, Wan ;
Dong, Xuecheng .
IET COMPUTER VISION, 2025, 19 (01)
[36]   A New Framework for Automatic Airports Extraction from SAR Images Using Multi-Level Dual Attention Mechanism [J].
Chen, Lifu ;
Tan, Siyu ;
Pan, Zhouhao ;
Xing, Jin ;
Yuan, Zhihui ;
Xing, Xuemin ;
Zhang, Peng .
REMOTE SENSING, 2020, 12 (03)
[37]   A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer [J].
Jiang, Wen ;
Pan, Hanxin ;
Wang, Yanping ;
Li, Yang ;
Lin, Yun ;
Bi, Fukun .
REMOTE SENSING, 2024, 16 (16)
[38]   Automated grading of breast cancer histopathology using cascaded ensemble with combination of multi-level image features [J].
Wan, Tao ;
Cao, Jiajia ;
Chen, Jianhui ;
Qin, Zengchang .
NEUROCOMPUTING, 2017, 229 :34-44
[39]   Multi-focus image fusion using pixel level deep learning convolutional neural network [J].
Rout, Manmay ;
Nahak, Siddheswar ;
Priyadarshinee, Subhashree ;
Santoshroy, Prasanti ;
Sa, Kodanda Dhar ;
Dash, Dillip .
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, :582-586
[40]   Spatio-temporal multi-level attention crop mapping method using time-series SAR imagery [J].
Han, Zhu ;
Zhang, Ce ;
Gao, Lianru ;
Zeng, Zhiqiang ;
Zhang, Bing ;
Atkinson, Peter M. .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 206 :293-310