A Context Semantic Auxiliary Network for Image Captioning

被引:0
作者
Li, Jianying [1 ,2 ]
Shao, Xiangjun [1 ,3 ]
机构
[1] Hunan Univ Arts & Sci, Sch Comp & Elect Engn, Changde 415000, Peoples R China
[2] Key Lab Hunan Prov Control Technol Distributed Ele, Changde 415000, Peoples R China
[3] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; attention mechanism; image captioning; ATTENTION; TRANSFORMER;
D O I
10.3390/info14070419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning is a challenging task, which generates a sentence for a given image. The earlier captioning methods mainly decode the visual features to generate caption sentences for the image. However, the visual features lack the context semantic information which is vital for generating an accurate caption sentence. To address this problem, this paper first proposes the Attention-Aware (AA) mechanism which can filter out erroneous or irrelevant context semantic information. And then, AA is utilized to constitute a Context Semantic Auxiliary Network (CSAN), which can capture the effective context semantic information to regenerate or polish the image caption. Moreover, AA can capture the visual feature information needed to generate a caption. Experimental results show that our proposed CSAN outperforms the compared image captioning methods on MS COCO "Karpathy" offline test split and the official online testing server.
引用
收藏
页数:16
相关论文
共 58 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]   SPICE: Semantic Propositional Image Caption Evaluation [J].
Anderson, Peter ;
Fernando, Basura ;
Johnson, Mark ;
Gould, Stephen .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :382-398
[3]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[4]  
Banerjee S., 2005, P ACL WORKSH INTR EX, P65
[5]  
Chen F., 2021, P 2021 IEEE 30 INT S, P1, DOI DOI 10.1007/978-3-030-51812-7_172-1
[6]   GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints [J].
Chen, Fuhai ;
Ji, Rongrong ;
Sun, Xiaoshuai ;
Wu, Yongjian ;
Su, Jinsong .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1345-1353
[7]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[8]  
Cho K., 2014, C EMP METH NAT LANG, DOI [10.48550/arXiv.1406.1078, DOI 10.48550/ARXIV.1406.1078]
[9]   Rethinking the Form of Latent States in Image Captioning [J].
Dai, Bo ;
Ye, Deming ;
Lin, Dahua .
COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 :294-310
[10]  
Dauphin YN, 2017, PR MACH LEARN RES, V70