Learning Joint Multimodal Representation with Adversarial Attention Networks

被引:17
作者
Huang, Feiran [1 ]
Zhang, Xiaoming [2 ]
Li, Zhoujun [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
multimodal; representation learning; adversarial networks; attention model; siamese learning;
D O I
10.1145/3240508.3240614
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, learning a joint representation for the multimodal data (e.g., containing both visual content and text description) has attracted extensive research interests. Usually, the features of different modalities are correlational and compositive, and thus a joint representation capturing the correlation is more effective than a subset of the features. Most of existing multimodal representation learning methods suffer from lack of additional constraints to enhance the robustness of the learned representations. In this paper, a novel Adversarial Attention Networks (AAN) is proposed to incorporate both the attention mechanism and the adversarial networks for effective and robust multimodal representation learning. Specifically, a visual-semantic attention model with siamese learning strategy is proposed to encode the fine-grained correlation between visual and textual modalities. Meanwhile, the adversarial learning model is employed to regularize the generated representation by matching the posterior distribution of the representation to the given priors. Then, the two modules are incorporated into a integrated learning framework to learn the joint multimodal representation. Experimental results in two tasks, i.e., multi-label classification and tag recommendation, show that the proposed model outperforms state-of-the-art representation learning methods.
引用
收藏
页码:1874 / 1882
页数:9
相关论文
共 50 条
  • [41] Feature Equilibrium: An Adversarial Training Method to Improve Representation Learning
    Minghui Liu
    Meiyi Yang
    Jiali Deng
    Xuan Cheng
    Tianshu Xie
    Pan Deng
    Haigang Gong
    Ming Liu
    Xiaomin Wang
    International Journal of Computational Intelligence Systems, 16
  • [42] PRAAD: Pseudo representation adversarial learning for unsupervised anomaly detection
    Xi, Liang
    He, Dong
    Liu, Han
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2025, 89
  • [43] Multimodal Adversarial Learning Based Unsupervised Time Series Anomaly Detection
    Huang X.
    Zhang F.
    Fan H.
    Xi L.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (08): : 1655 - 1667
  • [44] Adversarial Representation Learning for Intelligent Condition Monitoring of Complex Machinery
    Sun, Shilin
    Wang, Tianyang
    Yang, Hongxing
    Chu, Fulei
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (05) : 5255 - 5265
  • [45] Attentive Representation Learning With Adversarial Training for Short Text Clustering
    Zhang, Wei
    Dong, Chao
    Yin, Jianhua
    Wang, Jianyong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (11) : 5196 - 5210
  • [46] Establishing joint attention with multimodal resources in lingua franca guided tours
    Hosoda, Yuri
    Aline, David
    LEARNING CULTURE AND SOCIAL INTERACTION, 2021, 31
  • [47] Adversarial Adaptive Interpolation in Autoencoders for Dually Regularizing Representation Learning
    Li, Guanyue
    Wei, Xiwen
    Wu, Si
    Yu, Zhiwen
    Qian, Sheng
    Wong, Hau-San
    IEEE MULTIMEDIA, 2022, 29 (03) : 57 - 67
  • [48] Incremental Unit Networks for Distributed, Symbolic Multimodal Processing and Representation
    Imtiaz, Mir Tahsin
    Kennington, Casey
    DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT: HEALTH, OPERATIONS MANAGEMENT, AND DESIGN, PT II, 2022, 13320 : 344 - 363
  • [49] Joint compression and despeckling by SAR representation learning
    Amao-Oliva, Joel
    Foix-Colonier, Nils
    Sica, Francescopaolo
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 220 : 524 - 534
  • [50] Learning a Joint Representation for Classification of Networked Documents
    You, Zhenni
    Qian, Tieyun
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 199 - 209