SACIC: A Semantics-Aware Convolutional Image Captioner Using Multi-level Pervasive Attention

被引:0
作者
Parameswaran, Sandeep Narayan [1 ]
Das, Sukhendu [1 ]
机构
[1] Indian Inst Technol Madras, Dept Comp Sci & Engn, Visualizat & Percept Lab, Chennai, Tamil Nadu, India
来源
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III | 2019年 / 11955卷
关键词
Image captioning; Convolutional neural networks; Deep learning; Computer vision;
D O I
10.1007/978-3-030-36718-3_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention mechanisms alongside encoder-decoder architectures have become integral components for solving the image captioning problem. The attention mechanism recombines an encoding of the image depending on the state of the decoder, to generate the caption sequence. The decoder is predominantly recurrent in nature. In contrast, we propose a novel network possessing attention-like properties that are pervasive through its layers, by utilizing a convolutional neural network (CNN) to refine and combine representations at multiple levels of the architecture for captioning images. We also enable the model to use explicit higher-level semantic information obtained by performing panoptic segmentation on the image. The attention capability of the model is visually demonstrated, and an experimental evaluation is shown on the MS-COCO dataset. We exhibit that the approach is more robust, efficient, and yields better performance in comparison to the state-of-the-art architectures for image captioning.
引用
收藏
页码:64 / 76
页数:13
相关论文
共 50 条
[21]   Detecting and Tracking Sinkholes Using Multi-Level Convolutional Neural Networks and Data Association [J].
Hoai Nam Vu ;
Cuong Pham ;
Nguyen Manh Dung ;
Ro, Soonghwan .
IEEE ACCESS, 2020, 8 :132625-132641
[22]   HYPERSPECTRAL AND MULTISPECTRAL IMAGE FUSION USING A MULTI-LEVEL PROPAGATION LEARNING NETWORK [J].
Theran, Carlos A. ;
Alvarez, Michael A. ;
Arzuaga, Emmanuel ;
Sierra, Heidy .
2021 11TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2021,
[23]   Image Interpolation Using Multi-Scale Attention-Aware Inception Network [J].
Ji, Jiahuan ;
Zhong, Baojiang ;
Ma, Kai-Kuang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :9413-9428
[24]   MCUa: Multi-Level Context and Uncertainty Aware Dynamic Deep Ensemble for Breast Cancer Histology Image Classification [J].
Senousy, Zakaria ;
Abdelsamea, Mohammed M. ;
Gaber, Mohamed Medhat ;
Abdar, Moloud ;
Acharya, U. Rajendra ;
Khosravi, Abbas ;
Nahavandi, Saeid .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2022, 69 (02) :818-829
[25]   Spatiotemporal image fusion using multiscale attention-aware two-stream convolutional neural networks [J].
Chen, Yuehong ;
Ge, Yong .
SCIENCE OF REMOTE SENSING, 2022, 6
[26]   AMFTCNet: A multi-level attention-based multi-scale fusion temporal convolutional network for decoding MI-EEG signals [J].
Huang, Qianfeng ;
Yang, Yuanpo ;
Li, Jun ;
Liu, Xiuling ;
Liu, Xiaoguang .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 108
[27]   Multi-Level Feature Fusion Attention Generative Adversarial Network for Retinal Optical Coherence Tomography Image Denoising [J].
Qian, Yiming ;
Meng, Yichao .
APPLIED SCIENCES-BASEL, 2025, 15 (12)
[28]   Automatic Diabetic Foot Ulcer Recognition Using Multi-Level Thermographic Image Data [J].
Khosa, Ikramullah ;
Raza, Awais ;
Anjum, Mohd ;
Ahmad, Waseem ;
Shahab, Sana .
DIAGNOSTICS, 2023, 13 (16)
[29]   Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging [J].
Lee, Jongpil ;
Nam, Juhan .
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) :1208-1212
[30]   A Hybrid Multi-level image denosingapproach using segmentation and CNN framrework on different imaging systems [J].
Kumar, K. Kiran ;
Rajasekar, B. .
INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (04) :291-301