SACIC: A Semantics-Aware Convolutional Image Captioner Using Multi-level Pervasive Attention

被引:0
|
作者
Parameswaran, Sandeep Narayan [1 ]
Das, Sukhendu [1 ]
机构
[1] Indian Inst Technol Madras, Dept Comp Sci & Engn, Visualizat & Percept Lab, Chennai, Tamil Nadu, India
来源
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III | 2019年 / 11955卷
关键词
Image captioning; Convolutional neural networks; Deep learning; Computer vision;
D O I
10.1007/978-3-030-36718-3_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention mechanisms alongside encoder-decoder architectures have become integral components for solving the image captioning problem. The attention mechanism recombines an encoding of the image depending on the state of the decoder, to generate the caption sequence. The decoder is predominantly recurrent in nature. In contrast, we propose a novel network possessing attention-like properties that are pervasive through its layers, by utilizing a convolutional neural network (CNN) to refine and combine representations at multiple levels of the architecture for captioning images. We also enable the model to use explicit higher-level semantic information obtained by performing panoptic segmentation on the image. The attention capability of the model is visually demonstrated, and an experimental evaluation is shown on the MS-COCO dataset. We exhibit that the approach is more robust, efficient, and yields better performance in comparison to the state-of-the-art architectures for image captioning.
引用
收藏
页码:64 / 76
页数:13
相关论文
共 50 条
  • [1] SMINet:Semantics-aware multi-level feature interaction network for surface defect detection
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Zhu, Zunjie
    Yin, Haibing
    Hu, Ji
    Zhang, Jiyong
    Yan, Chenggang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [2] Effective image annotation for search using multi-level semantics
    Cheng, PJ
    Chien, LF
    DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 230 - 242
  • [3] Using Semantics-Aware Composition and Weaving for Multi-Variant Progressive Parallelization
    Mey, Johannes
    Karol, Sven
    Assmann, Uwe
    Huismann, Immo
    Stiller, Joerg
    Froehlich, Jochen
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 1554 - 1565
  • [4] Pervasive kidney health inequities for Maori require multi-level attention
    Tipene-Leach, David
    Walker, Rachael
    NATURE REVIEWS NEPHROLOGY, 2022, 18 (09) : 541 - 542
  • [5] A novel semantics-based image retrieval method using similarity measure of multi-level semantics
    Department of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
    Xibei Gongye Daxue Xuebao, 2008, 5 (588-591): : 588 - 591
  • [6] Integration of multi-level semantics in PTMs with an attention model for question matching
    Ye, Zheng
    Che, Linwei
    Ge, Jun
    Qin, Jun
    Liu, Jing
    PLOS ONE, 2024, 19 (08):
  • [7] Automatic Concrete Damage Recognition Using Multi-Level Attention Convolutional Neural Network
    Shin, Hyun Kyu
    Ahn, Yong Han
    Lee, Sang Hyo
    Kim, Ha Young
    MATERIALS, 2020, 13 (23) : 1 - 13
  • [8] Pervasive kidney health inequities for Māori require multi-level attention
    David Tipene-Leach
    Rachael Walker
    Nature Reviews Nephrology, 2022, 18 : 541 - 542
  • [9] Multi-level semantic-aware transformer for image captioning
    Xu, Qin
    Song, Shan
    Wu, Qihang
    Jiang, Bo
    Luo, Bin
    Tang, Jinhui
    NEURAL NETWORKS, 2025, 187
  • [10] Multi-level semantics probability embedding for image-text matching
    Liu, An-An
    Yang, Long
    Li, Wenhui
    Nie, Weizhi
    Liu, Xianzhu
    Chen, Haipeng
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)