Learning Joint Multimodal Representation with Adversarial Attention Networks

被引:17
|
作者
Huang, Feiran [1 ]
Zhang, Xiaoming [2 ]
Li, Zhoujun [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
multimodal; representation learning; adversarial networks; attention model; siamese learning;
D O I
10.1145/3240508.3240614
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, learning a joint representation for the multimodal data (e.g., containing both visual content and text description) has attracted extensive research interests. Usually, the features of different modalities are correlational and compositive, and thus a joint representation capturing the correlation is more effective than a subset of the features. Most of existing multimodal representation learning methods suffer from lack of additional constraints to enhance the robustness of the learned representations. In this paper, a novel Adversarial Attention Networks (AAN) is proposed to incorporate both the attention mechanism and the adversarial networks for effective and robust multimodal representation learning. Specifically, a visual-semantic attention model with siamese learning strategy is proposed to encode the fine-grained correlation between visual and textual modalities. Meanwhile, the adversarial learning model is employed to regularize the generated representation by matching the posterior distribution of the representation to the given priors. Then, the two modules are incorporated into a integrated learning framework to learn the joint multimodal representation. Experimental results in two tasks, i.e., multi-label classification and tag recommendation, show that the proposed model outperforms state-of-the-art representation learning methods.
引用
收藏
页码:1874 / 1882
页数:9
相关论文
共 50 条
  • [1] Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks
    Huang, Feiran
    Jolfaei, Alireza
    Bashir, Ali Kashif
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (05) : 856 - 868
  • [2] Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking
    Vukotic, Vedran
    Raymond, Christian
    Gravier, Guillaume
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 421 - 424
  • [3] Multimodal adversarial representation learning for breast cancer prognosis prediction
    Du, Xiuquan
    Zhao, Yuefan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
  • [4] Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
    Vukotic, Vedran
    Raymond, Christian
    Gravier, Guillaume
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 343 - 346
  • [5] Learning Joint Multimodal Representation Based on Multi-fusion Deep Neural Networks
    Gu, Zepeng
    Lang, Bo
    Yue, Tongyu
    Huang, Lei
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 276 - 285
  • [6] Multimodal attention for lip synthesis using conditional generative adversarial networks
    Vidal, Andrea
    Busso, Carlos
    SPEECH COMMUNICATION, 2023, 153
  • [7] Adversarial Multimodal Representation Learning for Click-Through Rate Prediction
    Li, Xiang
    Wang, Chao
    Tan, Jiwei
    Zeng, Xiaoyi
    Ou, Dan
    Zheng, Bo
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 827 - 836
  • [8] Neighborhood Attention Networks With Adversarial Learning for Link Prediction
    Wang, Zhitao
    Lei, Yu
    Li, Wenjie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3653 - 3663
  • [9] SAR IMAGE REPRESENTATION LEARNING WITH ADVERSARIAL AUTOENCODER NETWORKS
    Song, Qian
    Xu, Feng
    Jin, Ya-Qiu
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 9498 - 9501
  • [10] Learning Graph Topology Representation with Attention Networks
    Qi, Yuanyuan
    Zhang, Jiayue
    Xu, Weiran
    Guo, Jun
    Zhang, Honggang
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 1 - 4