Semantics Disentangling for Cross-Modal Retrieval

被引:5
|
作者
Wang, Zheng [1 ,2 ,3 ]
Xu, Xing [4 ,5 ]
Wei, Jiwei [4 ,5 ]
Xie, Ning [4 ,5 ]
Yang, Yang [1 ,2 ,3 ]
Shen, Heng Tao [4 ,5 ,6 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] UESTC Guangdong, Inst Elect & Informat Engn, Dongguan 523808, Peoples R China
[4] Univ Elect Sci & Technol China UESTC, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[5] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; semantics disentangling; dual adversarial mechanism; subspace learning; REPRESENTATION;
D O I
10.1109/TIP.2024.3374111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval (e.g., query a given image to obtain a semantically similar sentence, and vice versa) is an important but challenging task, as the heterogeneous gap and inconsistent distributions exist between different modalities. The dominant approaches struggle to bridge the heterogeneity by capturing the common representations among heterogeneous data in a constructed subspace which can reflect the semantic closeness. However, insufficient consideration is taken into the fact that learned latent representations are actually heavily entangled with those semantic-unrelated features, which obviously further compounds the challenges of cross-modal retrieval. To alleviate the difficulty, this work makes an assumption that the data are jointly characterized by two independent features: semantic-shared and semantic-unrelated representations. The former presents characteristics of consistent semantics shared by different modalities, while the latter reflects the characteristics with respect to the modality yet unrelated to semantics, such as background, illumination, and other low-level information. Therefore, this paper aims to disentangle the shared semantics from the entangled features, andthus the purer semantic representation can promote the closeness of paired data. Specifically, this paper designs a novel Semantics Disentangling approach for Cross-Modal Retrieval (termed as SDCMR) to explicitly decouple the two different features based on variational auto-encoder. Next, the reconstruction is performed by exchanging shared semantics to ensure the learning of semantic consistency. Moreover, a dual adversarial mechanism is designed to disentangle the two independent features via a pushing-and-pulling strategy. Comprehensive experiments on four widely used datasets demonstrate the effectiveness and superiority of the proposed SDCMR method by achieving a new bar on performance when compared against 15 state-of-the-art methods.
引用
收藏
页码:2226 / 2237
页数:12
相关论文
共 50 条
  • [21] CROSS-MODAL RETRIEVAL WITH NOISY LABELS
    Mandal, Devraj
    Biswas, Soma
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2326 - 2330
  • [22] Cross-Modal Retrieval for CPSS Data
    Zhong, Fangming
    Wang, Guangze
    Chen, Zhikui
    Xia, Feng
    Min, Geyong
    IEEE ACCESS, 2020, 8 : 16689 - 16701
  • [23] Hashing for Cross-Modal Similarity Retrieval
    Liu, Yao
    Yuan, Yanhong
    Huang, Qiaoli
    Huang, Zhixing
    2015 11TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2015, : 1 - 8
  • [24] A Graph Model for Cross-modal Retrieval
    Wang, Shixun
    Pan, Peng
    Lu, Yansheng
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
  • [25] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [26] Deep Supervised Cross-modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Wang, Xu
    Peng, Dezhong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10386 - 10395
  • [27] Cross-modal retrieval with dual optimization
    Qingzhen Xu
    Shuang Liu
    Han Qiao
    Miao Li
    Multimedia Tools and Applications, 2023, 82 : 7141 - 7157
  • [28] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [29] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [30] Correspondence Autoencoders for Cross-Modal Retrieval
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    Ahmad, Ibrar
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2015, 12 (01)