Semantics Disentangling for Cross-Modal Retrieval

被引:5
|
作者
Wang, Zheng [1 ,2 ,3 ]
Xu, Xing [4 ,5 ]
Wei, Jiwei [4 ,5 ]
Xie, Ning [4 ,5 ]
Yang, Yang [1 ,2 ,3 ]
Shen, Heng Tao [4 ,5 ,6 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] UESTC Guangdong, Inst Elect & Informat Engn, Dongguan 523808, Peoples R China
[4] Univ Elect Sci & Technol China UESTC, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[5] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; semantics disentangling; dual adversarial mechanism; subspace learning; REPRESENTATION;
D O I
10.1109/TIP.2024.3374111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval (e.g., query a given image to obtain a semantically similar sentence, and vice versa) is an important but challenging task, as the heterogeneous gap and inconsistent distributions exist between different modalities. The dominant approaches struggle to bridge the heterogeneity by capturing the common representations among heterogeneous data in a constructed subspace which can reflect the semantic closeness. However, insufficient consideration is taken into the fact that learned latent representations are actually heavily entangled with those semantic-unrelated features, which obviously further compounds the challenges of cross-modal retrieval. To alleviate the difficulty, this work makes an assumption that the data are jointly characterized by two independent features: semantic-shared and semantic-unrelated representations. The former presents characteristics of consistent semantics shared by different modalities, while the latter reflects the characteristics with respect to the modality yet unrelated to semantics, such as background, illumination, and other low-level information. Therefore, this paper aims to disentangle the shared semantics from the entangled features, andthus the purer semantic representation can promote the closeness of paired data. Specifically, this paper designs a novel Semantics Disentangling approach for Cross-Modal Retrieval (termed as SDCMR) to explicitly decouple the two different features based on variational auto-encoder. Next, the reconstruction is performed by exchanging shared semantics to ensure the learning of semantic consistency. Moreover, a dual adversarial mechanism is designed to disentangle the two independent features via a pushing-and-pulling strategy. Comprehensive experiments on four widely used datasets demonstrate the effectiveness and superiority of the proposed SDCMR method by achieving a new bar on performance when compared against 15 state-of-the-art methods.
引用
收藏
页码:2226 / 2237
页数:12
相关论文
共 50 条
  • [1] Semantics-Reconstructing Hashing for Cross-Modal Retrieval
    Zhang, Peng-Fei
    Huang, Zi
    Zhang, Zheng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 315 - 327
  • [2] Exploring Graph-Structured Semantics for Cross-Modal Retrieval
    Zhang, Lei
    Chen, Leiting
    Zhou, Chuan
    Yang, Fan
    Li, Xin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4277 - 4286
  • [3] Cross-specificity: modelling data semantics for cross-modal matching and retrieval
    Verma, Yashaswi
    Jha, Abhishek
    Jawahar, C., V
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (02) : 139 - 146
  • [4] Cross-specificity: modelling data semantics for cross-modal matching and retrieval
    Yashaswi Verma
    Abhishek Jha
    C. V. Jawahar
    International Journal of Multimedia Information Retrieval, 2018, 7 : 139 - 146
  • [5] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [6] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [7] Weighted Graph-Structured Semantics Constraint Network for Cross-Modal Retrieval
    Zhang, Lei
    Chen, Leiting
    Zhou, Chuan
    Li, Xin
    Yang, Fan
    Yi, Zhang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1551 - 1564
  • [8] A semi-supervised cross-modal memory bank for cross-modal retrieval
    Huang, Yingying
    Hu, Bingliang
    Zhang, Yipeng
    Gao, Chi
    Wang, Quan
    NEUROCOMPUTING, 2024, 579
  • [9] Cross-Modal Center Loss for 3D Cross-Modal Retrieval
    Jing, Longlong
    Vahdani, Elahe
    Tan, Jiaxing
    Tian, Yingli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3141 - 3150
  • [10] Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model
    Shixun Wang
    Peng Pan
    Yansheng Lu
    Liang Xie
    Multimedia Tools and Applications, 2015, 74 : 2009 - 2032