MULTI-MODAL LEARNING WITH GENERALIZABLE NONLINEAR DIMENSIONALITY REDUCTION

被引:0
作者
Kaya, Semih [1 ]
Vural, Elif [1 ]
机构
[1] METU, Dept Elect & Elect Engn, Ankara, Turkey
来源
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) | 2019年
关键词
Cross-modal learning; multi-view learning; cross-modal retrieval; nonlinear embeddings; RBF interpolators;
D O I
10.1109/icip.2019.8803196
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
In practical machine learning settings, there often exist relations or links between data from different modalities. The goal of multimodal learning algorithms is to efficiently use the information available in different modalities to solve multi-modal classification or retrieval problems. In this study, we propose a multi-modal supervised representation learning algorithm based on nonlinear dimensionality reduction. Nonlinear embeddings often yield more flexible representations compared to linear counterparts especially in case of high dissimilarity between the data geometries in different modalities. Based on recent performance bounds on nonlinear dimensionality reduction, we propose an optimization objective aiming to improve the intra- and inter-modal within-class compactness and between-class separation, as well as the Lipschitz regularity of the interpolator that generalizes the embedding to the whole data space. Experiments in multi-view face recognition and image-text retrieval applications show that the proposed method yields promising performance in comparison with state-of-the-art multi-modal learning methods.
引用
收藏
页码:2139 / 2143
页数:5
相关论文
共 50 条
[21]   YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation [J].
Ma, Le ;
Wu, Xinda ;
Tang, Ruiyuan ;
Zhong, Chongjun ;
Zhang, Kejun .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[22]   Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval [J].
Wu, Hongchang ;
Guan, Ziyu ;
Zhi, Tao ;
zhao, Wei ;
Xu, Cai ;
Han, Hong ;
Yang, Yarning .
2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, :265-272
[23]   Multi-View Projection Learning via Adaptive Graph Embedding for Dimensionality Reduction [J].
Li, Haohao ;
Gao, Mingliang ;
Wang, Huibing ;
Jeon, Gwanggil .
ELECTRONICS, 2023, 12 (13)
[24]   Multi-modal Dictionary BERT for Cross-modal Video Search in Baidu Advertising [J].
Yu, Tan ;
Yang, Yi ;
Li, Yi ;
Liu, Lin ;
Sun, Mingming ;
Li, Ping .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, :4341-4351
[25]   Nonparametric Bayesian Upstream Supervised Multi-Modal Topic Models [J].
Liao, Renjie ;
Zhu, Jun ;
Qin, Zengchang .
WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, :493-502
[26]   MULTI-MODAL CONTINUAL PRE-TRAINING FOR AUDIO ENCODERS [J].
Kim, Gyuhak ;
Wu, Ho-Hsiang ;
Bondi, Luca ;
Liu, Bing .
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, :691-695
[27]   Fake News Detection Based on Multi-Modal Classifier Ensemble [J].
Shao, Yi ;
Sun, Jiande ;
Zhang, Tianlin ;
Jiang, Ye ;
Ma, Jianhua ;
Li, Jing .
1ST ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2022, 2022, :78-86
[28]   Multi-modal graph regularization based class center discriminant analysis for cross modal retrieval [J].
Meijia Zhang ;
Huaxiang Zhang ;
Junzheng Li ;
Yixian Fang ;
Li Wang ;
Fei Shang .
Multimedia Tools and Applications, 2019, 78 :28285-28307
[29]   Multi-modal graph regularization based class center discriminant analysis for cross modal retrieval [J].
Zhang, Meijia ;
Zhang, Huaxiang ;
Lie, Junzheng ;
Fang, Yixian ;
Wang, Li ;
Shang, Fei .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (19) :28285-28307
[30]   Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval [J].
Pei, Xinlei ;
Liu, Zheng ;
Gao, Shanshan ;
Su, Yijun .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216