Combining Global and Local Similarity for Cross-Media Retrieval

被引:20
作者
Li, Zhixin [1 ]
Ling, Feng [1 ]
Zhang, Canlong [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
Convolutional neural network; self-attention network; attention mechanism; two-level network; cross-media retrieval;
D O I
10.1109/ACCESS.2020.2969808
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper mainly studies the problem of image-text matching in order to make image and text better match. Existing cross-media retrieval methods only make use of the information of image and part of text, that is, matching the whole image with the whole sentence, or matching some image areas with some words. In order to better reveal the potential connection between image and text semantics, this paper proposes a fusion of two levels of similarity across media images-text retrieval method, constructed the cross-media two-level network to explore the better matching between images and texts, it contains two subnets for dealing with global features and local characteristics. Specifically, in this method, the image is divided into the whole picture and some image area, the text is divided into the whole sentences and words, to study respectively, to explore the full potential alignment of images and text, and then use a two-level alignment framework is used to promote each other, fusion of two kinds of similarity can learn to complete representation of cross-media retrieval. Through the experimental evaluation on Flickr30K and MS-COCO datasets, the results show that the method in this paper can make the semantic matching of image and text more accurate, and is superior to the international popular cross-media retrieval method in various evaluation indexes.
引用
收藏
页码:21847 / 21856
页数:10
相关论文
共 50 条
  • [21] Learning a Limited Text Space for Cross-Media Retrieval
    Yu, Zheng
    Wang, Wenmin
    Fan, Mengdi
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, 2017, 10424 : 292 - 303
  • [22] Cross-media retrieval with collective deep semantic learning
    Zhang, Bin
    Zhu, Lei
    Sun, Jiande
    Zhang, Huaxiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (17) : 22247 - 22266
  • [23] Online latent semantic hashing for cross-media retrieval
    Yao, Tao
    Wang, Gang
    Yan, Lianshan
    Kong, Xiangwei
    Su, Qingtang
    Zhang, Caiming
    Tian, Qi
    PATTERN RECOGNITION, 2019, 89 : 1 - 11
  • [24] Semantic convex matrix factorisation for cross-media retrieval
    Fang, Yixian
    Ren, Yuwei
    Zhang, Huaxiang
    IET IMAGE PROCESSING, 2019, 13 (01) : 196 - 205
  • [25] Discrete Semantic Alignment Hashing for Cross-Media Retrieval
    Yao, Tao
    Kong, Xiangwei
    Fu, Haiyan
    Tian, Qi
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (12) : 4896 - 4907
  • [26] Finding the best picture: Cross-media retrieval of content
    Deschacht, Koen
    Moens, Marie-Francine
    ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 539 - 546
  • [27] ENHANCED ISOMORPHIC SEMANTIC REPRESENTATION FOR CROSS-MEDIA RETRIEVAL
    Liu, Ting
    Zhao, Yao
    Wei, Shikui
    Wei, Yunchao
    Liao, Lixin
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 967 - 972
  • [28] Cross-media retrieval based on linear discriminant analysis
    Yudan Qi
    Huaxiang Zhang
    Bin Zhang
    Li Wang
    Shunxin Zheng
    Multimedia Tools and Applications, 2019, 78 : 24249 - 24268
  • [29] An Approach for Mining Heterogeneous Data for Cross-Media Retrieval
    Pavan, K. Madhu
    Ananthanarayana, V. S.
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [30] Understanding multimedia document semantics for cross-media retrieval
    Wu, F
    Yang, Y
    Zhuang, YT
    Pan, YH
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2005, PT 1, 2005, 3767 : 993 - 1004