Local Alignment with Global Semantic Consistence Network for Image-Text Matching

被引:0
作者
Li, Pengwei [1 ]
Wu, Shihua [1 ]
Lian, Zhichao [2 ]
机构
[1] China Elect Technol Grp Corp, Res Inst 28, Nanjing, Peoples R China
[2] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
来源
2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH) | 2022年
关键词
image-text matching; local alignment; label information; global semantic;
D O I
10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927900
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text matching is a major task in cross-modal information processing, which refers to measuring the similarity between an image and a sentence. The existing methods are mainly divided into global embedding and local alignment. Recently, local alignment methods that uses the fine-grained features to explore the correspondence between image regions and text words have achieved impressive results. However, the local alignment methods only focus on the matching between significant objects and ignores the importance of global semantics. To solve this problem, we propose a novel Local Alignment with Global Semantic Consistence Network (LAGSC).which performs cross-modal matching at global and local levels. Our method provides supervisory information for image-text pairs through label vectors of image regions, so as to maintain global semantic consistency. Experiment results on two benchmark datasets Flickr30K and MS-COCO prove the effectiveness of our method.
引用
收藏
页码:652 / 657
页数:6
相关论文
共 26 条
[1]   VQA: Visual Question Answering [J].
Agrawal, Aishwarya ;
Lu, Jiasen ;
Antol, Stanislaw ;
Mitchell, Margaret ;
Zitnick, C. Lawrence ;
Parikh, Devi ;
Batra, Dhruv .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :4-31
[2]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[3]  
Andrew G, 2013, INT C MACH LEARN, V28
[4]  
Cho K., 2014, COMPUT SCI
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Faghri F., 2018, P BRIT MACH VIS C BM
[7]   Canonical correlation analysis: An overview with application to learning methods [J].
Hardoon, DR ;
Szedmak, S ;
Shawe-Taylor, J .
NEURAL COMPUTATION, 2004, 16 (12) :2639-2664
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]   Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching [J].
Huang, Feiran ;
Zhang, Xiaoming ;
Zhao, Zhonghua ;
Li, Zhoujun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) :2008-2020
[10]   Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [J].
Krishna, Ranjay ;
Zhu, Yuke ;
Groth, Oliver ;
Johnson, Justin ;
Hata, Kenji ;
Kravitz, Joshua ;
Chen, Stephanie ;
Kalantidis, Yannis ;
Li, Li-Jia ;
Shamma, David A. ;
Bernstein, Michael S. ;
Li Fei-Fei .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :32-73