2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

被引：0

作者：

Guo, Jingjing ^{[1
,2
]}

Yu, Jing ^{[1
,2
]}

Lu, Yuhang ^{[1
,2
]}

Hu, Yue ^{[1
]}

Liu, Yanbing ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

来源：

COMPUTATIONAL SCIENCE - ICCS 2019, PT II | 2019年 / 11537卷

关键词：

2D-convolutional network; Inner-group relationship; Feature fusion; Cross-modal correlation; Cross-modal information retrieval;

D O I：

10.1007/978-3-030-22741-8_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal information retrieval (CMIR) enables users to search for semantically relevant data of various modalities from a given query of one modality. The predominant challenge is to alleviate the "heterogeneous gap" between different modalities. For text-image retrieval, the typical solution is to project text features and image features into a common semantic space and measure the cross-modal similarity. However, semantically relevant data from different modalities usually contains imbalanced information. Aligning all the modalities in the same space will weaken modal-specific semantics and introduce unexpected noise. In this paper, we propose a novel CMIR framework based on multi-modal feature fusion. In this framework, the cross-modal similarity is measured by directly analyzing the fine-grained correlations between the text features and image features without common semantic space learning. Specifically, we preliminarily construct a cross-modal feature matrix to fuse the original visual and textural features. Then the 2D-convolutional networks are proposed to reason about inner-group relationships among features across modalities, resulting in fine-grained text-image representations. The cross-modal similarity is measured by a multi-layer perception based on the fused feature representations. We conduct extensive experiments on two representative CMIR datasets, i.e. English Wikipedia and TVGraz. Experimental results indicate that our model outperforms state-of-the-art methods significantly. Meanwhile, the proposed cross-modal feature fusion approach is more effective in the CMIR tasks compared with other feature fusion approaches.

引用

页码：131 / 144

页数：14

共 50 条

[1] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
Yuan, Xu
Zhong, Hua
Chen, Zhikui
Zhong, Fangming
Hu, Yueming
INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
[2] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
Cao Xiaopeng
Zhang Linying
Chen Qiuxian
Ning Hailong
Dong Yizhuo
The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
[3] Semi-discriminant cross-modal correlation feature fusion with structure elasticity
Zhu, Yanmin
Peng, Tianhao
Su, Shuzhi
OPTIK, 2022, 254
[4] Cross-Modal Retrieval with Correlation Feature Propagation
Zhang L.
Cao F.
Liang X.
Qian Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (09): : 1993 - 2002
[5] CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network
Peng, Yuxin
Qi, Jinwei
Huang, Xin
Yuan, Yuxin
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 405 - 420
[6] Automatic curtain wall frame detection based on deep learning and cross-modal feature fusion
Wu, Decheng
Li, Yu
Li, Rui
Cheng, Longqi
Zhao, Jingyuan
Zhao, Mingfu
Lee, Chul Hee
AUTOMATION IN CONSTRUCTION, 2024, 160
[7] Estimation of Pig Weight Based on Cross-modal Feature Fusion Model
He W.
Mi Y.
Liu G.
Ding X.
Li T.
Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 : 275 - 282and329
[8] A Cross-Modal Feature Fusion Model Based on ConvNeXt for RGB-D Semantic Segmentation
Tang, Xiaojiang
Li, Baoxia
Guo, Junwei
Chen, Wenzhuo
Zhang, Dan
Huang, Feng
MATHEMATICS, 2023, 11 (08)
[9] Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval
Lv, Yafei
Xiong, Wei
Zhang, Xiaohan
Cui, Yaqi
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[10] Joint feature fusion hashing for cross-modal retrieval
Cao, Yuxia
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 6149 - 6162

← 1 2 3 4 5 →