Improved Multimodal Deep Learning with Variation of Information

被引:0
|
作者
Sohn, Kihyuk [1 ]
Shang, Wenling [1 ]
Lee, Honglak [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has been successfully applied to multimodal representation learning problems, with a common strategy to learning joint representations that are shared across multiple modalities on top of layers of modality-specific networks. Nonetheless, there still remains a question how to learn a good association between data modalities; in particular, a good generative model of multimodal data should be able to reason about missing data modality given the rest of data modalities. In this paper, we propose a novel multimodal representation learning framework that explicitly aims this goal. Rather than learning with maximum likelihood, we train the model to minimize the variation of information. We provide a theoretical insight why the proposed learning objective is sufficient to estimate the data-generating joint distribution of multimodal data. We apply our method to restricted Boltzmann machines and introduce learning methods based on contrastive divergence and multi-prediction training. In addition, we extend to deep networks with recurrent encoding structure to finetune the whole network. In experiments, we demon\strate the state-of-the-art visual recognition performance on MIR-Flickr database and PASCAL VOC 2007 database with and without text features.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Deep Multimodal Learning for Information Retrieval
    Ji, Wei
    Wei, Yinwei
    Zheng, Zhedong
    Fei, Hao
    Chua, Tat-Seng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9739 - 9741
  • [2] An Improved Deep Learning Framework for Multimodal Medical Data Analysis
    Kumar, Sachin
    Sharma, Shivani
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (10)
  • [3] Multimodal information bottleneck for deep reinforcement learning with multiple sensors
    You, Bang
    Liu, Huaping
    NEURAL NETWORKS, 2024, 176
  • [4] Enhancing multimodal deep learning for improved precision and efficiency in medical diagnostics
    Jin, Keyan
    JOURNAL OF THE EUROPEAN ACADEMY OF DERMATOLOGY AND VENEREOLOGY, 2024,
  • [5] Improved Diagnostic Imaging of Brain Tumors by Multimodal Microscopy and Deep Learning
    Gesperger, Johanna
    Lichtenegger, Antonia
    Roetzer, Thomas
    Salas, Matthias
    Eugui, Pablo
    Harper, Danielle J.
    Merkle, Conrad W.
    Augustin, Marco
    Kiesel, Barbara
    Mercea, Petra A.
    Widhalm, Georg
    Baumann, Bernhard
    Woehrer, Adelheid
    CANCERS, 2020, 12 (07) : 1 - 16
  • [6] Unimodal and Multimodal Integrated Representation Learning via Improved Information Bottleneck for Multimodal Sentiment Analysis
    Zhang, Tonghui
    Dong, Changfei
    Su, Jinsong
    Zhang, Haiying
    Li, Yuzheng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 564 - 576
  • [7] Multimodal Deep Learning using Images and Text for Information Graphic Classification
    Kim, Edward
    McCoy, Kathleen F.
    ASSETS'18: PROCEEDINGS OF THE 20TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2018, : 143 - 148
  • [8] EFFECT OF TERRAIN INFORMATION ON MULTIMODAL DEEP LEARNING FOR FLOOD DISASTER DETECTION
    Miyamoto, Takashi
    Stricker, Marco
    Ogishima, Jun
    Iselborn, Kevin
    Nuske, Marlon
    Dengel, Andreas
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 448 - 451
  • [9] Research on Online Review Information Classification Based on Multimodal Deep Learning
    Liu, Jingnan
    Sun, Yefang
    Zhang, Yueyi
    Lu, Chenyuan
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [10] Course video recommendation with multimodal information in online learning platforms: A deep learning framework
    Xu, Wei
    Zhou, Yuhan
    BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2020, 51 (05) : 1734 - 1747