Combining Global and Local Similarity for Cross-Media Retrieval

被引:20
作者
Li, Zhixin [1 ]
Ling, Feng [1 ]
Zhang, Canlong [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
Convolutional neural network; self-attention network; attention mechanism; two-level network; cross-media retrieval;
D O I
10.1109/ACCESS.2020.2969808
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper mainly studies the problem of image-text matching in order to make image and text better match. Existing cross-media retrieval methods only make use of the information of image and part of text, that is, matching the whole image with the whole sentence, or matching some image areas with some words. In order to better reveal the potential connection between image and text semantics, this paper proposes a fusion of two levels of similarity across media images-text retrieval method, constructed the cross-media two-level network to explore the better matching between images and texts, it contains two subnets for dealing with global features and local characteristics. Specifically, in this method, the image is divided into the whole picture and some image area, the text is divided into the whole sentences and words, to study respectively, to explore the full potential alignment of images and text, and then use a two-level alignment framework is used to promote each other, fusion of two kinds of similarity can learn to complete representation of cross-media retrieval. Through the experimental evaluation on Flickr30K and MS-COCO datasets, the results show that the method in this paper can make the semantic matching of image and text more accurate, and is superior to the international popular cross-media retrieval method in various evaluation indexes.
引用
收藏
页码:21847 / 21856
页数:10
相关论文
共 50 条
  • [31] Cross-media retrieval method based on content correlation
    Zhang, Hong
    Wu, Fei
    Zhuang, Yue-Ting
    Chen, Jian-Xun
    Jisuanji Xuebao/Chinese Journal of Computers, 2008, 31 (05): : 820 - 826
  • [32] Bagging-based cross-media retrieval algorithm
    Gongwen Xu
    Yu Zhang
    Mingshan Yin
    Wenzhong Hong
    Ran Zou
    Shanshan Wang
    Soft Computing, 2023, 27 : 2615 - 2623
  • [33] Cross-media Hash Retrieval Using Multi-head Attention Network
    Li, Zhixin
    Ling, Feng
    Xu, Chuansheng
    Zhang, Canlong
    Ma, Huifang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1290 - 1297
  • [34] Cross-media similarity metric learning with unified deep networks
    Qi, Jinwei
    Huang, Xin
    Peng, Yuxin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (23) : 25109 - 25127
  • [35] Cross-media similarity metric learning with unified deep networks
    Jinwei Qi
    Xin Huang
    Yuxin Peng
    Multimedia Tools and Applications, 2017, 76 : 25109 - 25127
  • [36] Two-stage semantic matching for cross-media retrieval
    Xu G.
    Xu L.
    Zhang M.
    Li X.
    International Journal of Performability Engineering, 2018, 14 (04) : 795 - 804
  • [37] Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval
    Li, Chuan-Xiang
    Yan, Ting-Kun
    Luo, Xin
    Nie, Liqiang
    Xu, Xin-Shun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2863 - 2877
  • [38] CROSS-MEDIA RETRIEVAL BY CLUSTER-BASED CORRELATION ANALYSIS
    Ma, Ding
    Zhai, Xiaohua
    Peng, Yuxin
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3986 - 3990
  • [39] COUPLED FEATURE MAPPING AND CORRELATION MINING FOR CROSS-MEDIA RETRIEVAL
    Fan, Mengdi
    Wang, Wenmin
    Wang, Ronggang
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2016,
  • [40] Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia
    Fan, Mengdi
    Wang, Wenmin
    Dong, Peilei
    Han, Liang
    Wang, Ronggang
    Li, Ge
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1698 - 1706