Image Caption Generation with Local Semantic and Global Information

被引:0
作者
Liu, Xing [1 ]
Liu, Weibin [1 ]
Xing, Weiwei [2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Sch Softwar Engeneering, Beijing, Peoples R China
来源
2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019) | 2019年
基金
中国国家自然科学基金;
关键词
component; image caption; computer vision; feature extraction; LSTM; image representation;
D O I
10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Different regions in the image would play different roles in the image description domain, while some key information exists in a small region or some importance features need to be extracted from the whole image. Generally, we only use CNN to extract the features of an image and then utilize those features to generate the description of the image. However, this method is easy to ignore some importance information in the image. In this paper, we propose an image description method which combines the local information and global features of an image. The local information is extracted by a target detection model (SSD) and the global feature is extracted by the multi-instance learning (MIL) method. Our model which works with the above two methods has a good performance on the public dataset MS-COCO.
引用
收藏
页码:680 / 685
页数:6
相关论文
共 24 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]   SPICE: Semantic Propositional Image Caption Evaluation [J].
Anderson, Peter ;
Fernando, Basura ;
Johnson, Mark ;
Gould, Stephen .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :382-398
[3]  
Fang H, 2015, PROC CVPR IEEE, P1473, DOI 10.1109/CVPR.2015.7298754
[4]   LSTM: A Search Space Odyssey [J].
Greff, Klaus ;
Srivastava, Rupesh K. ;
Koutnik, Jan ;
Steunebrink, Bas R. ;
Schmidhuber, Juergen .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (10) :2222-2232
[5]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[6]  
Karayil Tushar., 2016, Proceedings of ACM international conference on Multimedia, P1111
[7]  
Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
[8]   Image Caption Generation with Hierarchical Contextual Visual Spatial Attention [J].
Khademi, Mahmoud ;
Schulte, Oliver .
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :2024-2032
[9]  
Kiros R, 2014, PR MACH LEARN RES, V32, P595
[10]   The METEOR metric for automatic evaluation of machine translation [J].
Lavie, Alon ;
Denkowski, Michael J. .
MACHINE TRANSLATION, 2009, 23 (2-3) :105-115