A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引:0
作者
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey
来源
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2020年
关键词
Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.
引用
收藏
页数:4
相关论文
共 11 条
  • [1] Bach Francis R, 2004, P 21 INT C MACH LEAR, P6
  • [2] Bennett K.P., 2002, P 8 ACM SIGKDD INT C, P24
  • [3] Learning Aligned Cross-Modal Representations from Weakly Aligned Data
    Castrejon, Lluis
    Aytar, Yusuf
    Vondrick, Carl
    Pirsiavash, Hamed
    Torralba, Antonio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2940 - 2949
  • [4] Kaya S, 2019, IEEE IMAGE PROC, P2139, DOI [10.1109/icip.2019.8803196, 10.1109/ICIP.2019.8803196]
  • [5] Lanckriet GRG, 2004, J MACH LEARN RES, V5, P27
  • [6] MDL-CW: A Multimodal Deep Learning Framework with Cross Weights
    Rastegar, Sarah
    Baghshah, Mandieh Soleymani
    Rabiee, Hamid R.
    Shojaee, Seyed Mohsen
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2601 - 2609
  • [7] Sharma A, 2012, PROC CVPR IEEE, P2160, DOI 10.1109/CVPR.2012.6247923
  • [8] Vural E, 2018, J MACH LEARN RES, V18, P1
  • [9] Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval
    Wang, Kaiye
    He, Ran
    Wang, Liang
    Wang, Wei
    Tan, Tieniu
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 2010 - 2023
  • [10] Cross-Modal Retrieval With CNN Visual Features: A New Baseline
    Wei, Yunchao
    Zhao, Yao
    Lu, Canyi
    Wei, Shikui
    Liu, Luoqi
    Zhu, Zhenfeng
    Yan, Shuicheng
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (02) : 449 - 460