A Context Semantic Auxiliary Network for Image Captioning

被引:0
作者
Li, Jianying [1 ,2 ]
Shao, Xiangjun [1 ,3 ]
机构
[1] Hunan Univ Arts & Sci, Sch Comp & Elect Engn, Changde 415000, Peoples R China
[2] Key Lab Hunan Prov Control Technol Distributed Ele, Changde 415000, Peoples R China
[3] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; attention mechanism; image captioning; ATTENTION; TRANSFORMER;
D O I
10.3390/info14070419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning is a challenging task, which generates a sentence for a given image. The earlier captioning methods mainly decode the visual features to generate caption sentences for the image. However, the visual features lack the context semantic information which is vital for generating an accurate caption sentence. To address this problem, this paper first proposes the Attention-Aware (AA) mechanism which can filter out erroneous or irrelevant context semantic information. And then, AA is utilized to constitute a Context Semantic Auxiliary Network (CSAN), which can capture the effective context semantic information to regenerate or polish the image caption. Moreover, AA can capture the visual feature information needed to generate a caption. Experimental results show that our proposed CSAN outperforms the compared image captioning methods on MS COCO "Karpathy" offline test split and the official online testing server.
引用
收藏
页数:16
相关论文
共 58 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] SPICE: Semantic Propositional Image Caption Evaluation
    Anderson, Peter
    Fernando, Basura
    Johnson, Mark
    Gould, Stephen
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
  • [3] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]
  • [4] Banerjee Satanjeev, 2005, P ACL WORKSH INTR EX, P65
  • [5] Chen F., 2021, P 2021 IEEE 30 INT S, P1, DOI DOI 10.1007/978-3-030-51812-7_172-1
  • [6] GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints
    Chen, Fuhai
    Ji, Rongrong
    Sun, Xiaoshuai
    Wu, Yongjian
    Su, Jinsong
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1345 - 1353
  • [7] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
    Chen, Long
    Zhang, Hanwang
    Xiao, Jun
    Nie, Liqiang
    Shao, Jian
    Liu, Wei
    Chua, Tat-Seng
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
  • [8] Cho K., 2014, P EMNLP ASS COMP LIN
  • [9] Rethinking the Form of Latent States in Image Captioning
    Dai, Bo
    Ye, Deming
    Lin, Dahua
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 294 - 310
  • [10] Dauphin YN, 2017, PR MACH LEARN RES, V70