A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引：0

作者：

Vural, Elif ^{[1
]}

机构：

[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey

来源：

2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2020年

关键词：

Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.

引用

页数：4

共 11 条

[1] Bach Francis R, 2004, P 21 INT C MACH LEAR, P6
[2] Bennett K.P., 2002, P 8 ACM SIGKDD INT C, P24
[3] Learning Aligned Cross-Modal Representations from Weakly Aligned Data
Castrejon, Lluis
Aytar, Yusuf
Vondrick, Carl
Pirsiavash, Hamed
Torralba, Antonio
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2940 - 2949
[4] Kaya S, 2019, IEEE IMAGE PROC, P2139, DOI [10.1109/icip.2019.8803196, 10.1109/ICIP.2019.8803196]
[5] Lanckriet GRG, 2004, J MACH LEARN RES, V5, P27
[6] MDL-CW: A Multimodal Deep Learning Framework with Cross Weights
Rastegar, Sarah
Baghshah, Mandieh Soleymani
Rabiee, Hamid R.
Shojaee, Seyed Mohsen
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2601 - 2609
[7] Sharma A, 2012, PROC CVPR IEEE, P2160, DOI 10.1109/CVPR.2012.6247923
[8] Vural E, 2018, J MACH LEARN RES, V18, P1
[9] Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval
Wang, Kaiye
He, Ran
Wang, Liang
Wang, Wei
Tan, Tieniu
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 2010 - 2023
[10] Cross-Modal Retrieval With CNN Visual Features: A New Baseline
Wei, Yunchao
Zhao, Yao
Lu, Canyi
Wei, Shikui
Liu, Luoqi
Zhu, Zhenfeng
Yan, Shuicheng
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (02) : 449 - 460

← 1 2 →