MULTI-MODAL LEARNING WITH GENERALIZABLE NONLINEAR DIMENSIONALITY REDUCTION

被引:0
作者
Kaya, Semih [1 ]
Vural, Elif [1 ]
机构
[1] METU, Dept Elect & Elect Engn, Ankara, Turkey
来源
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) | 2019年
关键词
Cross-modal learning; multi-view learning; cross-modal retrieval; nonlinear embeddings; RBF interpolators;
D O I
10.1109/icip.2019.8803196
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
In practical machine learning settings, there often exist relations or links between data from different modalities. The goal of multimodal learning algorithms is to efficiently use the information available in different modalities to solve multi-modal classification or retrieval problems. In this study, we propose a multi-modal supervised representation learning algorithm based on nonlinear dimensionality reduction. Nonlinear embeddings often yield more flexible representations compared to linear counterparts especially in case of high dissimilarity between the data geometries in different modalities. Based on recent performance bounds on nonlinear dimensionality reduction, we propose an optimization objective aiming to improve the intra- and inter-modal within-class compactness and between-class separation, as well as the Lipschitz regularity of the interpolator that generalizes the embedding to the whole data space. Experiments in multi-view face recognition and image-text retrieval applications show that the proposed method yields promising performance in comparison with state-of-the-art multi-modal learning methods.
引用
收藏
页码:2139 / 2143
页数:5
相关论文
共 50 条
[41]   AGE-VOX-CELEB: MULTI-MODAL CORPUS FOR FACIAL AND SPEECH ESTIMATION [J].
Tawara, Naohiro ;
Ogawa, Atsunori ;
Kitagishi, Yuki ;
Kamiyama, Hosana .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6963-6967
[42]   Contrasting Dual Transformer Architectures for Multi-Modal Remote Sensing Image Retrieval [J].
Al Rahhal, Mohamad M. ;
Bencherif, Mohamed Abdelkader ;
Bazi, Yakoub ;
Alharbi, Abdullah ;
Mekhalfi, Mohamed Lamine .
APPLIED SCIENCES-BASEL, 2023, 13 (01)
[43]   An Intelligent Advertisement Short Video Production System via Multi-Modal Retrieval [J].
Wei, Yanheng ;
Huang, Lianghua ;
Zhang, Yanhao ;
Zheng, Yun ;
Pan, Pan .
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, :3368-3372
[44]   Multi-view dimensionality reduction via subspace structure agreement [J].
Xuran Zhao ;
Xun Wang ;
Huiyan Wang .
Multimedia Tools and Applications, 2017, 76 :17437-17460
[45]   Multi-view dimensionality reduction via subspace structure agreement [J].
Zhao, Xuran ;
Wang, Xun ;
Wang, Huiyan .
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (16) :17437-17460
[46]   MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion [J].
Liang, Ke ;
Meng, Lingyuan ;
Li, Hao ;
Liu, Meng ;
Wang, Siwei ;
Zhou, Sihang ;
Liu, Xinwang ;
He, Kunlun .
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 :1722-1735
[47]   A Preliminary Study on Performance Evaluation of Multi-View Multi-Modal Gaze Estimation under Challenging Conditions [J].
Kim, Jung-Hwa ;
Jeong, Jin-Woo .
CHI'20: EXTENDED ABSTRACTS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2020,
[48]   Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising [J].
Yu, Tan ;
Liu, Jie ;
Jin, Zhipeng ;
Yang, Yi ;
Fei, Hongliang ;
Li, Ping .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, :4655-4660
[49]   M3LH: Multi-modal Multi-label Hashing for Large Scale Data Search [J].
Yang, Guan-Qun ;
Xu, Xin-Shun ;
Guo, Shanqing ;
Wang, Xiao-Lin .
MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 :201-213
[50]   Autoweighted multi-view smooth representation preserve projection for dimensionality reduction [J].
Li, Haohao ;
Su, Zhixun ;
Wang, Huibing ;
Liu, Ximin .
JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (02)