Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval

被引:17
|
作者
Xu, Xing [1 ,2 ]
Lin, Kaiyi [1 ,2 ,3 ]
Lu, Huimin [4 ]
Gao, Lianli [1 ,2 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Multimedia, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[4] Kyushu Inst Technol, Dept Mech & Control Engn, Kitakyushu, Fukuoka, Japan
基金
中国国家自然科学基金;
关键词
Cross-modal Retrieval; Zero-shot Learning; Feature Synthesis;
D O I
10.1145/3397271.3401149
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The goal of cross-modal retrieval is to search for semantically similar instances in one modality by using a query from another modality. Existing approaches mainly consider the standard scenario that requires the source set for training and the target set for testing share the same scope of classes. However, they may not generalize well on zero-shot cross-modal retrieval (ZS-CMR) task, where the target set contains unseen classes that are disjoint with the seen classes in the source set. This task is more challenging due to 1) the absence of the unseen classes during training, 2) inconsistent semantics across seen and unseen classes, and 3) the heterogeneous multimodal distributions between the source and target set. To address these issues, we propose a novel Correlated Feature Synthesis and Alignment (CFSA) approach to integrate multimodal feature synthesis, common space learning and knowledge transfer for ZSCMR. Our CFSA first utilizes class-level word embeddings to guide two coupled Wassertein generative adversarial networks (WGANs) to synthesize sufficient multimodal features with semantic correlation for stable training. Then the synthetic and true multimodal features are jointly mapped to a common semantic space via an effective distribution alignment scheme, where the cross-modal correlations of different semantic features are captured and the knowledge can be transferred to the unseen classes under the cycle-consistency constraint. Experiments on four benchmark datasets for image-text retrieval and two large-scale datasets for image-sketch retrieval show the remarkable improvements achieved by our CFAS method comparing with a bundle of state-of-the-art approaches.
引用
收藏
页码:1419 / 1428
页数:10
相关论文
共 50 条
  • [21] Cross-modal Representation Learning for Zero-shot Action Recognition
    Lin, Chung-Ching
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    Li, Linjie
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19946 - 19956
  • [22] Manifold regularized cross-modal embedding for zero-shot learning
    Ji, Zhong
    Yu, Yunlong
    Pang, Yanwei
    Guo, Jichang
    Zhang, Zhongfei
    INFORMATION SCIENCES, 2017, 378 : 48 - 58
  • [23] Cross-modal propagation network for generalized zero-shot learning
    Guo, Ting
    Liang, Jianqing
    Liang, Jiye
    Xie, Guo-Sen
    PATTERN RECOGNITION LETTERS, 2022, 159 : 125 - 131
  • [24] Two-stage zero-shot sparse hashing with missing labels for cross-modal retrieval
    Yong, Kailing
    Shu, Zhenqiu
    Wang, Hongbin
    Yu, Zhengtao
    PATTERN RECOGNITION, 2024, 155
  • [25] Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
    Tian J.-L.
    Xu X.
    Shen F.-M.
    Shen H.-T.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):
  • [26] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
    Deng, Cheng
    Xu, Xinxun
    Wang, Hao
    Yang, Muli
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
  • [27] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Jiao, Shichao
    Han, Xie
    Xiong, Fengguang
    Yang, Xiaowen
    Han, Huiyan
    He, Ligang
    Kuang, Liqun
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16): : 13469 - 13483
  • [28] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Shichao Jiao
    Xie Han
    Fengguang Xiong
    Xiaowen Yang
    Huiyan Han
    Ligang He
    Liqun Kuang
    Neural Computing and Applications, 2022, 34 : 13469 - 13483
  • [29] Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification
    Fang, Zhiyu
    Zhu, Xiaobin
    Yang, Chun
    Han, Zheng
    Qin, Jingyan
    Yin, Xu-Cheng
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6605 - 6613
  • [30] DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning
    Chen, Zhuo
    Huang, Yufeng
    Chen, Jiaoyan
    Geng, Yuxia
    Zhang, Wen
    Fang, Yin
    Pan, Jeff Z.
    Chen, Huajun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 405 - 413