MULTI-ORDER ADVERSARIAL REPRESENTATION LEARNING FOR COMPOSED QUERY IMAGE RETRIEVAL

被引:4
作者
Fu, Zhixiao [1 ]
Chen, Xinyuan [2 ]
Dong, Jianfeng [3 ,4 ]
Ji, Shouling [1 ,4 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] East China Normal Univ, Shanghai, Peoples R China
[3] Zhejiang Gongshang Univ, Hangzhou, Peoples R China
[4] Alibaba Zhejiang Univ Joint Res Inst Frontier Tec, Hangzhou, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
基金
中国博士后科学基金;
关键词
Image retrieval; Adversarial learning; Multi-order representation; PERSON REIDENTIFICATION;
D O I
10.1109/ICASSP39728.2021.9414436
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper targets at a task of composed query image retrieval. Given a composed query consists of a reference image and modification text, the task aims to retrieve images which are generally similar to the reference image but differ according to the given modification text. The task is challenging, due to the complexity of the composed query and cross-modality characteristics between the query and candidate images. The common paradigm for the task is to first obtain fused feature of the reference image and the text, and further project them into a common embedding space with candidate images. However, the majority of works usually only aim for the representation of high level, ignoring the low-level representation which may be complementary to the high-level representation. So this paper proposes a new Multi-order Adversarial Network (MAN) which uses multi-level representations and simultaneously explores their low-order and high-order interactions, obtaining low-order and high-order features. The low-order features reflect the pattern of itself and high-order features contains the interaction between features. Moreover, we further introduce an adversarial module to constrain the fusion of the reference image and the text. Extensive experiments on three datasets verify the effectiveness of our MAN and also demonstrate its state-of-the-art performance.
引用
收藏
页码:1685 / 1689
页数:5
相关论文
共 29 条
  • [1] Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer
    Chen, Xinyuan
    Xu, Chang
    Yang, Xiaokang
    Song, Li
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 546 - 560
  • [2] Chen Y.-C., 2020, COMPUTER VISION ECCV
  • [3] Image Search with Text Feedback by Visiolinguistic Attention Learning
    Chen, Yanbei
    Gong, Shaogang
    Bazzani, Loris
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2998 - 3008
  • [4] An Arching Theory for Arch Tunnels Based on the Interaction Between the Lateral and Vertical Pressure in Good Ground
    Cheng, Xiaohu
    [J]. PROCEEDINGS OF GEOSHANGHAI 2018 INTERNATIONAL CONFERENCE: TUNNELLING AND UNDERGROUND CONSTRUCTION, 2018, : 164 - 180
  • [5] Dual Encoding for Video Retrieval by Text
    Dong, Jianfeng
    Li, Xirong
    Xu, Chaoxi
    Yang, Xun
    Yang, Gang
    Wang, Xun
    Wang, Meng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (08) : 4065 - 4080
  • [6] Dual Encoding for Zero-Example Video Retrieval
    Dong, Jianfeng
    Li, Xirong
    Xu, Chaoxi
    Ji, Shouling
    He, Yuan
    Yang, Gang
    Wang, Xun
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9338 - 9347
  • [7] Predicting Visual Features From Text for Image and Video Caption Retrieval
    Dong, Jianfeng
    Li, Xirong
    Snoek, Cees G. M.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (12) : 3377 - 3388
  • [8] Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild
    Dong, Jianfeng
    Li, Xirong
    Xu, Duanqing
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (09) : 2371 - 2384
  • [9] Dong Jianfeng, 2019, IEEE T KNOWLEDGE DAT
  • [10] Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction
    El-Nouby, Alaaeldin
    Sharma, Shikhar
    Schulz, Hannes
    Hjelm, Devon
    El Asri, Layla
    Kahou, Samira Ebrahimi
    Bengio, Yoshua
    Taylor, Graham W.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10303 - 10311