Cross-modal retrieval with dual optimization

被引:0
作者
Qingzhen Xu
Shuang Liu
Han Qiao
Miao Li
机构
[1] South China Normal University,School of Computer Science
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Cross-modal retrieval; Modality gap; Inter-modal optimization; Intra-modal optimization;
D O I
暂无
中图分类号
学科分类号
摘要
For the flexible retrieval of data in different modalities, cross-modal retrieval has gradually attracted the attention of researchers. However, there is a heterogeneity gap between the data of different modalities, which cannot be measured directly. To solve this problem, researchers project data of different modalities into a common representation space to compensate for the heterogeneity of data of different modalities. However, existing methods with pair or triple constraints ignore the rich information between samples, which leads to the degradation of retrieval performance. In order to fully mine the information of samples, this paper proposes a cross-modal retrieval method (CMRDO) with dual optimization. First, the method optimizes the common representation space from inter-modal and intra-modal, respectively. Secondly, we introduce an efficient sample construction strategy to avoid sample pairs with less information. Finally, the bi-directional retrieval strategy we introduced can effectively capture the potential structure of query modal. In the three public datasets, the proposed CMRDO can effectively improve the final cross-modal retrieval accuracy, and has strong generalization ability.
引用
收藏
页码:7141 / 7157
页数:16
相关论文
共 86 条
[1]  
Hardoon DR(2004)Canonical correlation analysis: an overview with application to learning methods Neural Comput 16 2639-2664
[2]  
Szedmák S(2020)MHTN: modal-adversarial hybrid transfer network for cross-modal retrieval IEEE Trans Cybern 50 1047-1059
[3]  
Shawe-Taylor J(2016)Multi-view discriminant analysis IEEE Trans Pattern Anal Mach Intell 38 188-194
[4]  
Huang X(2015)Learning consistent feature representation for cross-modal multimedia retrieval IEEE Trans Multimedia 17 370-381
[5]  
Peng Y(2008)Visualizing data using t-sne J Mach Learn Res 9 2579-2605
[6]  
Yuan M(2017)Deep coupled metric learning for cross-modal matching IEEE Trans Multimedia 19 1234-1244
[7]  
Kan M(2018)CCL: cross-modal correlation learning with multigrained fusion by hierarchical network IEEE Trans Multimedia 20 405-420
[8]  
Shan S(2014)On the role of correlation and abstraction in cross-modal multimedia retrieval IEEE Trans Pattern Anal Mach Intell 36 521-535
[9]  
Zhang H(2019)Gait analysis and recognition prediction of the human skeleton based on migration learning Phys A: Stat Mech Appl 532 121812-521
[10]  
Lao S(2019)Detected text-based image retrieval approach for textual images IET Image Process 13 515-1200