2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

被引:0
|
作者
Guo, Jingjing [1 ,2 ]
Yu, Jing [1 ,2 ]
Lu, Yuhang [1 ,2 ]
Hu, Yue [1 ]
Liu, Yanbing [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTATIONAL SCIENCE - ICCS 2019, PT II | 2019年 / 11537卷
关键词
2D-convolutional network; Inner-group relationship; Feature fusion; Cross-modal correlation; Cross-modal information retrieval;
D O I
10.1007/978-3-030-22741-8_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal information retrieval (CMIR) enables users to search for semantically relevant data of various modalities from a given query of one modality. The predominant challenge is to alleviate the "heterogeneous gap" between different modalities. For text-image retrieval, the typical solution is to project text features and image features into a common semantic space and measure the cross-modal similarity. However, semantically relevant data from different modalities usually contains imbalanced information. Aligning all the modalities in the same space will weaken modal-specific semantics and introduce unexpected noise. In this paper, we propose a novel CMIR framework based on multi-modal feature fusion. In this framework, the cross-modal similarity is measured by directly analyzing the fine-grained correlations between the text features and image features without common semantic space learning. Specifically, we preliminarily construct a cross-modal feature matrix to fuse the original visual and textural features. Then the 2D-convolutional networks are proposed to reason about inner-group relationships among features across modalities, resulting in fine-grained text-image representations. The cross-modal similarity is measured by a multi-layer perception based on the fused feature representations. We conduct extensive experiments on two representative CMIR datasets, i.e. English Wikipedia and TVGraz. Experimental results indicate that our model outperforms state-of-the-art methods significantly. Meanwhile, the proposed cross-modal feature fusion approach is more effective in the CMIR tasks compared with other feature fusion approaches.
引用
收藏
页码:131 / 144
页数:14
相关论文
共 50 条
  • [1] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
    Yuan, Xu
    Zhong, Hua
    Chen, Zhikui
    Zhong, Fangming
    Hu, Yueming
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
  • [2] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [3] Semi-discriminant cross-modal correlation feature fusion with structure elasticity
    Zhu, Yanmin
    Peng, Tianhao
    Su, Shuzhi
    OPTIK, 2022, 254
  • [4] Cross-Modal Retrieval with Correlation Feature Propagation
    Zhang L.
    Cao F.
    Liang X.
    Qian Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (09): : 1993 - 2002
  • [5] CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network
    Peng, Yuxin
    Qi, Jinwei
    Huang, Xin
    Yuan, Yuxin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 405 - 420
  • [6] Automatic curtain wall frame detection based on deep learning and cross-modal feature fusion
    Wu, Decheng
    Li, Yu
    Li, Rui
    Cheng, Longqi
    Zhao, Jingyuan
    Zhao, Mingfu
    Lee, Chul Hee
    AUTOMATION IN CONSTRUCTION, 2024, 160
  • [7] Estimation of Pig Weight Based on Cross-modal Feature Fusion Model
    He W.
    Mi Y.
    Liu G.
    Ding X.
    Li T.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 : 275 - 282and329
  • [8] A Cross-Modal Feature Fusion Model Based on ConvNeXt for RGB-D Semantic Segmentation
    Tang, Xiaojiang
    Li, Baoxia
    Guo, Junwei
    Chen, Wenzhuo
    Zhang, Dan
    Huang, Feng
    MATHEMATICS, 2023, 11 (08)
  • [9] Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval
    Lv, Yafei
    Xiong, Wei
    Zhang, Xiaohan
    Cui, Yaqi
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [10] Joint feature fusion hashing for cross-modal retrieval
    Cao, Yuxia
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 6149 - 6162