Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer

被引:9
作者
Yang, Qu [1 ]
Ye, Mang [1 ]
Cai, Zhaohui [1 ]
Su, Kehua [1 ]
Du, Bo [1 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Hubei Luojia Lab, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross Relation; Image Retrieval; Transformer;
D O I
10.1109/TIP.2023.3299791
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Composing Text and Image to Image Retrieval (CTI-IR) aims at finding the target image, which matches the query image visually along with the query text semantically. However, existing works ignore the fact that the reference text usually serves multiple functions, e.g., modification and auxiliary. To address this issue, we put forth a unified solution, namely Hierarchical Aggregation Transformer incorporated with Cross Relation Network (CRN). CRN unifies modification and relevance manner in a single framework. This configuration shows broader applicability, enabling us to model both modification and auxiliary text or their combination in triplet relationships simultaneously. Specifically, CRN includes: 1) Cross Relation Network comprehensively captures the relationships of various composed retrieval scenarios caused by two different query text types, allowing a unified retrieval model to designate adaptive combination strategies for flexible applicability; 2) Hierarchical Aggregation Transformer aggregates top-down features with Multi-layer Perceptron (MLP) to overcome the limitations of edge information loss in a window-based multi-stage Transformer. Extensive experiments demonstrate the superiority of the proposed CRN over all three fashion-domain datasets. Code is available at github.com/yan9qu/crn.
引用
收藏
页码:4543 / 4554
页数:12
相关论文
共 76 条
  • [61] Learning discriminative representation with global and fine-grained features for cross-view gait recognition
    Xiao, Jing
    Yang, Huan
    Xie, Kun
    Zhu, Jia
    Zhang, Ji
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (02) : 187 - 199
  • [62] Preference-based Evaluation Metrics for Web Image Search
    Xie, Xiaohui
    Jiaxin, O.
    Liu, Yiqun
    de Rijke, Maarten
    Chen, Haitian
    Zhang, Min
    Ma, Shaoping
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 369 - 378
  • [63] ImprovingWeb Image Search with Contextual Information
    Xie, Xiaohui
    Mao, Jiaxin
    Liu, Yiqun
    de Rijke, Maarten
    Ai, Qingyao
    Huang, Yufei
    Zhang, Min
    Ma, Shaoping
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1683 - 1692
  • [64] Yang X., 2020, COMPUTER VISION ECCV, P1
  • [65] Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies
    Yang, Yi
    Zhuang, Yueting
    Pan, Yunhe
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (12) : 1551 - 1558
  • [66] Cross-Modality Pyramid Alignment for Visual Intention Understanding
    Ye, Mang
    Shi, Qinghongya
    Su, Kehua
    Du, Bo
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2190 - 2201
  • [67] Deep Learning for Person Re-Identification: A Survey and Outlook
    Ye, Mang
    Shen, Jianbing
    Lin, Gaojie
    Xiang, Tao
    Shao, Ling
    Hoi, Steven C. H.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 2872 - 2893
  • [68] A Zero-Shot Framework for Sketch Based Image Retrieval
    Yelamarthi, Sasi Kiran
    Reddy, Shiva Krishna
    Mishra, Ashish
    Mittal, Anurag
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 316 - 333
  • [69] Yen-Chun Chen, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12375), P104, DOI 10.1007/978-3-030-58577-8_7
  • [70] Sketch Me That Shoe
    Yu, Qian
    Liu, Feng
    Song, Yi-Zhe
    Xiang, Tao
    Hospedales, Timothy M.
    Loy, Chen Change
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 799 - 807