MULTI-MODAL LEARNING WITH GENERALIZABLE NONLINEAR DIMENSIONALITY REDUCTION

被引:0
|
作者
Kaya, Semih [1 ]
Vural, Elif [1 ]
机构
[1] METU, Dept Elect & Elect Engn, Ankara, Turkey
来源
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) | 2019年
关键词
Cross-modal learning; multi-view learning; cross-modal retrieval; nonlinear embeddings; RBF interpolators;
D O I
10.1109/icip.2019.8803196
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
In practical machine learning settings, there often exist relations or links between data from different modalities. The goal of multimodal learning algorithms is to efficiently use the information available in different modalities to solve multi-modal classification or retrieval problems. In this study, we propose a multi-modal supervised representation learning algorithm based on nonlinear dimensionality reduction. Nonlinear embeddings often yield more flexible representations compared to linear counterparts especially in case of high dissimilarity between the data geometries in different modalities. Based on recent performance bounds on nonlinear dimensionality reduction, we propose an optimization objective aiming to improve the intra- and inter-modal within-class compactness and between-class separation, as well as the Lipschitz regularity of the interpolator that generalizes the embedding to the whole data space. Experiments in multi-view face recognition and image-text retrieval applications show that the proposed method yields promising performance in comparison with state-of-the-art multi-modal learning methods.
引用
收藏
页码:2139 / 2143
页数:5
相关论文
共 50 条
  • [1] Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm
    Kaya, Semih
    Vural, Elif
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4384 - 4394
  • [2] Cross-modal Representation Learning with Nonlinear Dimensionality Reduction
    Kaya, Semih
    Vural, Elif
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [3] Fast Multi-Modal Unified Sparse Representation Learning
    Verma, Mridula
    Shukla, Kaushal Kumar
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 448 - 452
  • [4] LEARNING UNIFIED SPARSE REPRESENTATIONS FOR MULTI-MODAL DATA
    Wang, Kaiye
    Wang, Wei
    Wang, Liang
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3545 - 3549
  • [5] Learning Shared and Specific Factors for Multi-modal Data
    Yin, Qiyue
    Huang, Yan
    Wu, Shu
    Wang, Liang
    COMPUTER VISION, PT II, 2017, 772 : 89 - 98
  • [6] A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions
    Vural, Elif
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [7] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
    Wang, Kaiye
    Wang, Wei
    He, Ran
    Wang, Liang
    Tan, Tieniu
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240
  • [8] Multi-modal Subspace Learning with Dropout regularization for Cross-modal Recognition and Retrieval
    Cao, Guanqun
    Waris, Muhammad Adeel
    Iosifidis, Alexandros
    Gabbouj, Moncef
    2016 SIXTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2016,
  • [9] Multi-view dimensionality reduction based on Universum learning
    Chen, Xiaohong
    Yin, Hujun
    Jiang, Fan
    Wang, Liping
    NEUROCOMPUTING, 2018, 275 : 2279 - 2286
  • [10] Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation
    Wang, Xiaoyu
    Kong, Xiangyu
    Peng, Xiulian
    Lu, Yan
    INTERSPEECH 2022, 2022, : 886 - 890