Cross-modal Representation Learning with Nonlinear Dimensionality Reduction

被引:0
|
作者
Kaya, Semih [1 ]
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elektr & Elekt Muhendisligi Bolumu, Ankara, Turkey
来源
2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2019年
关键词
Cross-modal learning; multi-view learning; nonlinear projections;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In many problems in machine learning there exist relations between data collections from different modalities. The purpose of multi-modal learning algorithms is to efficiently use the information present in different modalities when solving multi-modal retrieval problems. In this work, a multi-modal representation learning algorithm is proposed, which is based on nonlinear dimensionality reduction. Compared to linear dimensionality reduction methods, nonlinear methods provide more flexible representations especially when there is high discrepancy between the structures of different modalities. In this work, we propose to align different modalities by mapping same-class training data from different modalities to nearby coordinates, while we also learn a Lipschitz-continuous interpolation function that generalizes the learnt representation to the whole data space. Experiments in image-text retrieval applications show that the proposed method yields high performance when compared to multi-modal learning methods in the literature.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Cross-modal learning using privileged information for long-tailed image classification
    Li, Xiangxian
    Zheng, Yuze
    Ma, Haokai
    Qi, Zhuang
    Meng, Xiangxu
    Meng, Lei
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (05) : 981 - 992
  • [42] Through-Wall Human Pose Reconstruction Based on Cross-Modal Learning and Self-Supervised Learning
    Zheng, Zhijie
    Zhang, Diankun
    Liang, Xiao
    Liu, Xiaojun
    Fang, Guangyou
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [43] Cross-Modal Information-Guided Network Using Contrastive Learning for Point Cloud Registration
    Xie, Yifan
    Zhu, Jihua
    Li, Shiqi
    Shi, Pengcheng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (01): : 103 - 110
  • [44] Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
    Chen, Jing-Jing
    Ngo, Chong-Wah
    Feng, Fu-Li
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1020 - 1028
  • [45] MULTI-VIEW FUSION THROUGH CROSS-MODAL RETRIEVAL
    Cui, Limeng
    Chen, Zhensong
    Zhang, Jiawei
    He, Lifang
    Shi, Yong
    Yu, Philip S.
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1977 - 1981
  • [46] Silicon-based inorganic-organic hybrid optoelectronic synaptic devices simulating cross-modal learning
    Li, Yayao
    Wang, Yue
    Yin, Lei
    Huang, Wen
    Peng, Wenbing
    Zhu, Yiyuc
    Wang, Kun
    Yang, Deren
    Pi, Xiaodong
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (06)
  • [47] Silicon-based inorganic-organic hybrid optoelectronic synaptic devices simulating cross-modal learning
    Yayao LI
    Yue WANG
    Lei YIN
    Wen HUANG
    Wenbing PENG
    Yiyue ZHU
    Kun WANG
    Deren YANG
    Xiaodong PI
    ScienceChina(InformationSciences), 2021, 64 (06) : 188 - 195
  • [48] Silicon-based inorganic-organic hybrid optoelectronic synaptic devices simulating cross-modal learning
    Yayao Li
    Yue Wang
    Lei Yin
    Wen Huang
    Wenbing Peng
    Yiyue Zhu
    Kun Wang
    Deren Yang
    Xiaodong Pi
    Science China Information Sciences, 2021, 64
  • [49] Img2Acoustic: A Cross-Modal Gesture Recognition Method Based on Few-Shot Learning
    Zou, Yongpan
    Weng, Jianhao
    Kuang, Wenting
    Jiao, Yang
    Leung, Victor C. M.
    Wu, Kaishun
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (03) : 1496 - 1512
  • [50] DISENTANGLED SPEECH EMBEDDINGS USING CROSS-MODAL SELF-SUPERVISION
    Nagrani, Arsha
    Chung, Joon Son
    Albanie, Samuel
    Zisserman, Andrew
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6829 - 6833