Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer

被引：9

作者：

Yang, Qu ^{[1
]}

Ye, Mang ^{[1
]}

Cai, Zhaohui ^{[1
]}

Su, Kehua ^{[1
]}

Du, Bo ^{[1
]}

机构：

[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Hubei Luojia Lab, Wuhan 430072, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Cross Relation; Image Retrieval; Transformer;

D O I：

10.1109/TIP.2023.3299791

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Composing Text and Image to Image Retrieval (CTI-IR) aims at finding the target image, which matches the query image visually along with the query text semantically. However, existing works ignore the fact that the reference text usually serves multiple functions, e.g., modification and auxiliary. To address this issue, we put forth a unified solution, namely Hierarchical Aggregation Transformer incorporated with Cross Relation Network (CRN). CRN unifies modification and relevance manner in a single framework. This configuration shows broader applicability, enabling us to model both modification and auxiliary text or their combination in triplet relationships simultaneously. Specifically, CRN includes: 1) Cross Relation Network comprehensively captures the relationships of various composed retrieval scenarios caused by two different query text types, allowing a unified retrieval model to designate adaptive combination strategies for flexible applicability; 2) Hierarchical Aggregation Transformer aggregates top-down features with Multi-layer Perceptron (MLP) to overcome the limitations of edge information loss in a window-based multi-stage Transformer. Extensive experiments demonstrate the superiority of the proposed CRN over all three fashion-domain datasets. Code is available at github.com/yan9qu/crn.

引用

页码：4543 / 4554

页数：12

共 76 条

[61] Learning discriminative representation with global and fine-grained features for cross-view gait recognition
Xiao, Jing
Yang, Huan
Xie, Kun
Zhu, Jia
Zhang, Ji
[J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (02) : 187 - 199
[62] Preference-based Evaluation Metrics for Web Image Search
Xie, Xiaohui
Jiaxin, O.
Liu, Yiqun
de Rijke, Maarten
Chen, Haitian
Zhang, Min
Ma, Shaoping
[J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 369 - 378
[63] ImprovingWeb Image Search with Contextual Information
Xie, Xiaohui
Mao, Jiaxin
Liu, Yiqun
de Rijke, Maarten
Ai, Qingyao
Huang, Yufei
Zhang, Min
Ma, Shaoping
[J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1683 - 1692
[64] Yang X., 2020, COMPUTER VISION ECCV, P1
[65] Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies
Yang, Yi
Zhuang, Yueting
Pan, Yunhe
[J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (12) : 1551 - 1558
[66] Cross-Modality Pyramid Alignment for Visual Intention Understanding
Ye, Mang
Shi, Qinghongya
Su, Kehua
Du, Bo
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2190 - 2201
[67] Deep Learning for Person Re-Identification: A Survey and Outlook
Ye, Mang
Shen, Jianbing
Lin, Gaojie
Xiang, Tao
Shao, Ling
Hoi, Steven C. H.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 2872 - 2893
[68] A Zero-Shot Framework for Sketch Based Image Retrieval
Yelamarthi, Sasi Kiran
Reddy, Shiva Krishna
Mishra, Ashish
Mittal, Anurag
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 316 - 333
[69] Yen-Chun Chen, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12375), P104, DOI 10.1007/978-3-030-58577-8_7
[70] Sketch Me That Shoe
Yu, Qian
Liu, Feng
Song, Yi-Zhe
Xiang, Tao
Hospedales, Timothy M.
Loy, Chen Change
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 799 - 807

← 1 2 3 4 5 6 7 8 →