MIXED KNOWLEDGE RELATION TRANSFORMER FOR IMAGE CAPTIONING

被引:0
作者
Chen, Tianyu [1 ]
Li, Zhixin [1 ]
Wei, Jiahui [1 ]
Xian, Tiantao [1 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
基金
中国国家自然科学基金;
关键词
image captioning; external knowledge; object relation; LANGUAGE;
D O I
10.1109/ICASSP43922.2022.9747541
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Internal relationship of image objects has contributed significantly to the development of image captioning, especially when combined with Transformer architecture. Most of these methods only calculate the relationship between entities and ignore the information between entities and background. Besides, the way of exploring the relational information inside the image can also be extended. In this paper, we continually explore the relationship between objects from both internal and external perspectives, and embed the vital image global information into the internal relationship module. To validate the effectiveness of our model, we conduct extensive experiments on the most popular MSCOCO dataset, and achieve state-of-the-art performance on both online and offline test sets.
引用
收藏
页码:4403 / 4407
页数:5
相关论文
共 20 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]   Meshed-Memory Transformer for Image Captioning [J].
Cornia, Marcella ;
Stefanini, Matteo ;
Baraldi, Lorenzo ;
Cucchiara, Rita .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10575-10584
[3]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[4]  
Herdade S, 2019, ADV NEUR IN, V32
[5]   Boost image captioning with knowledge reasoning [J].
Huang, Feicheng ;
Li, Zhixin ;
Wei, Haiyang ;
Zhang, Canlong ;
Ma, Huifang .
MACHINE LEARNING, 2020, 109 (12) :2313-2332
[6]   Attention on Attention for Image Captioning [J].
Huang, Lun ;
Wang, Wenmin ;
Chen, Jie ;
Wei, Xiao-Yong .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4633-4642
[7]  
Ji JY, 2021, AAAI CONF ARTIF INTE, V35, P1655
[8]   Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [J].
Krishna, Ranjay ;
Zhu, Yuke ;
Groth, Oliver ;
Johnson, Justin ;
Hata, Kenji ;
Kravitz, Joshua ;
Chen, Stephanie ;
Kalantidis, Yannis ;
Li, Li-Jia ;
Shamma, David A. ;
Bernstein, Michael S. ;
Li Fei-Fei .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :32-73
[9]   Entangled Transformer for Image Captioning [J].
Li, Guang ;
Zhu, Linchao ;
Liu, Ping ;
Yang, Yi .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8927-8936
[10]  
Luo YP, 2021, AAAI CONF ARTIF INTE, V35, P2286