Rich Features Embedding for Cross-Modal Retrieval: A Simple Baseline

被引:12
|
作者
Fu, Xin [1 ,2 ]
Zhao, Yao [1 ,2 ]
Wei, Yunchao [3 ]
Zhao, Yufeng [4 ]
Wei, Shikui [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[2] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
[3] Univ Illinois, Beckman Inst, Champaign, IL 61801 USA
[4] China Acad Chinese Med Sci, Inst Basic Res Clin Med, Beijing 100700, Peoples R China
基金
美国国家科学基金会;
关键词
Semantics; Image representation; Visualization; Task analysis; Correlation; Training; Data models; Rich features embedding; image-text matching; deep representation learning; cross-modal retrieval;
D O I
10.1109/TMM.2019.2957948
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
During the past few years, significant progress has been made on cross-modal retrieval, benefiting from the development of deep neural networks. Meanwhile, the overall frameworks are becoming more and more complex, making the training as well as the analysis more difficult. In this paper, we provide a Rich Features Embedding (RFE) approach to tackle the cross-modal retrieval tasks in a simple yet effective way. RFE proposes to construct rich representations for both images and texts, which is further leveraged to learn the rich features embedding in the common space according to a simple hard triplet loss. Without any bells and whistles in constructing complex components, the proposed RFE is concise and easy to implement. More importantly, our RFE obtains the state-of-the-art results on several popular benchmarks such as MS COCO and Flickr 30 K. In particular, the image-to-text and text-to-image retrieval achieve 76.1% and 61.1% (R@1) on MS COCO, which outperform others more than 3.4% and 2.3%, respectively. We hope our RFE will serve as a solid baseline and help ease future research in cross-modal retrieval.
引用
收藏
页码:2354 / 2365
页数:12
相关论文
共 50 条
  • [1] Cross-Modal Retrieval With CNN Visual Features: A New Baseline
    Wei, Yunchao
    Zhao, Yao
    Lu, Canyi
    Wei, Shikui
    Liu, Luoqi
    Zhu, Zhenfeng
    Yan, Shuicheng
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (02) : 449 - 460
  • [2] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
  • [3] Cross-Modal Retrieval with Heterogeneous Graph Embedding
    Chen, Dapeng
    Wang, Min
    Chen, Haobin
    Wu, Lin
    Qin, Jing
    Peng, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300
  • [4] Binary Set Embedding for Cross-Modal Retrieval
    Yu, Mengyang
    Liu, Li
    Shao, Ling
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (12) : 2899 - 2910
  • [5] Double-scale similarity with rich features for cross-modal retrieval
    Zhao, Kaiqiang
    Wang, Hufei
    Zhao, Dexin
    MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1767 - 1777
  • [6] Double-scale similarity with rich features for cross-modal retrieval
    Kaiqiang Zhao
    Hufei Wang
    Dexin Zhao
    Multimedia Systems, 2022, 28 : 1767 - 1777
  • [7] Graph Embedding Learning for Cross-Modal Information Retrieval
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
  • [8] Label Embedding Online Hashing for Cross-Modal Retrieval
    Wang, Yongxin
    Luo, Xin
    Xu, Xin-Shun
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 871 - 879
  • [9] Cross-modal Recipe Retrieval with Rich Food Attributes
    Chen, Jing-Jing
    Ngo, Chong-Wah
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1771 - 1779
  • [10] Discrete semantic embedding hashing for scalable cross-modal retrieval
    Liu, Junjie
    Fei, Lunke
    Jia, Wei
    Zhao, Shuping
    Wen, Jie
    Teng, Shaohua
    Zhang, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1461 - 1467