Rich Features Embedding for Cross-Modal Retrieval: A Simple Baseline

被引：12

作者：

Fu, Xin ^{[1
,2
]}

Zhao, Yao ^{[1
,2
]}

Wei, Yunchao ^{[3
]}

Zhao, Yufeng ^{[4
]}

Wei, Shikui ^{[1
,2
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[2] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

[3] Univ Illinois, Beckman Inst, Champaign, IL 61801 USA

[4] China Acad Chinese Med Sci, Inst Basic Res Clin Med, Beijing 100700, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 09期

基金：

美国国家科学基金会;

关键词：

Semantics; Image representation; Visualization; Task analysis; Correlation; Training; Data models; Rich features embedding; image-text matching; deep representation learning; cross-modal retrieval;

D O I：

10.1109/TMM.2019.2957948

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

During the past few years, significant progress has been made on cross-modal retrieval, benefiting from the development of deep neural networks. Meanwhile, the overall frameworks are becoming more and more complex, making the training as well as the analysis more difficult. In this paper, we provide a Rich Features Embedding (RFE) approach to tackle the cross-modal retrieval tasks in a simple yet effective way. RFE proposes to construct rich representations for both images and texts, which is further leveraged to learn the rich features embedding in the common space according to a simple hard triplet loss. Without any bells and whistles in constructing complex components, the proposed RFE is concise and easy to implement. More importantly, our RFE obtains the state-of-the-art results on several popular benchmarks such as MS COCO and Flickr 30 K. In particular, the image-to-text and text-to-image retrieval achieve 76.1% and 61.1% (R@1) on MS COCO, which outperform others more than 3.4% and 2.3%, respectively. We hope our RFE will serve as a solid baseline and help ease future research in cross-modal retrieval.

引用

页码：2354 / 2365

页数：12

共 50 条

[1] Cross-Modal Retrieval With CNN Visual Features: A New Baseline
Wei, Yunchao
Zhao, Yao
Lu, Canyi
Wei, Shikui
Liu, Luoqi
Zhu, Zhenfeng
Yan, Shuicheng
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (02) : 449 - 460
[2] Deep Relation Embedding for Cross-Modal Retrieval
Zhang, Yifan
Zhou, Wengang
Wang, Min
Tian, Qi
Li, Houqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
[3] Cross-Modal Retrieval with Heterogeneous Graph Embedding
Chen, Dapeng
Wang, Min
Chen, Haobin
Wu, Lin
Qin, Jing
Peng, Wei
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300
[4] Binary Set Embedding for Cross-Modal Retrieval
Yu, Mengyang
Liu, Li
Shao, Ling
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (12) : 2899 - 2910
[5] Double-scale similarity with rich features for cross-modal retrieval
Zhao, Kaiqiang
Wang, Hufei
Zhao, Dexin
MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1767 - 1777
[6] Double-scale similarity with rich features for cross-modal retrieval
Kaiqiang Zhao
Hufei Wang
Dexin Zhao
Multimedia Systems, 2022, 28 : 1767 - 1777
[7] Graph Embedding Learning for Cross-Modal Information Retrieval
Zhang, Youcai
Gu, Xiaodong
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
[8] Label Embedding Online Hashing for Cross-Modal Retrieval
Wang, Yongxin
Luo, Xin
Xu, Xin-Shun
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 871 - 879
[9] Cross-modal Recipe Retrieval with Rich Food Attributes
Chen, Jing-Jing
Ngo, Chong-Wah
Chua, Tat-Seng
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1771 - 1779
[10] Discrete semantic embedding hashing for scalable cross-modal retrieval
Liu, Junjie
Fei, Lunke
Jia, Wei
Zhao, Shuping
Wen, Jie
Teng, Shaohua
Zhang, Wei
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1461 - 1467

← 1 2 3 4 5 →