Learning Joint Multimodal Representation with Adversarial Attention Networks

被引:17
|
作者
Huang, Feiran [1 ]
Zhang, Xiaoming [2 ]
Li, Zhoujun [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
multimodal; representation learning; adversarial networks; attention model; siamese learning;
D O I
10.1145/3240508.3240614
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, learning a joint representation for the multimodal data (e.g., containing both visual content and text description) has attracted extensive research interests. Usually, the features of different modalities are correlational and compositive, and thus a joint representation capturing the correlation is more effective than a subset of the features. Most of existing multimodal representation learning methods suffer from lack of additional constraints to enhance the robustness of the learned representations. In this paper, a novel Adversarial Attention Networks (AAN) is proposed to incorporate both the attention mechanism and the adversarial networks for effective and robust multimodal representation learning. Specifically, a visual-semantic attention model with siamese learning strategy is proposed to encode the fine-grained correlation between visual and textual modalities. Meanwhile, the adversarial learning model is employed to regularize the generated representation by matching the posterior distribution of the representation to the given priors. Then, the two modules are incorporated into a integrated learning framework to learn the joint multimodal representation. Experimental results in two tasks, i.e., multi-label classification and tag recommendation, show that the proposed model outperforms state-of-the-art representation learning methods.
引用
收藏
页码:1874 / 1882
页数:9
相关论文
共 50 条
  • [21] Zero-Shot Learning with Joint Generative Adversarial Networks
    Zhang, Minwan
    Wang, Xiaohua
    Shi, Yueting
    Ren, Shiwei
    Wang, Weijiang
    ELECTRONICS, 2023, 12 (10)
  • [22] Transfer Learning Based on Joint Feature Matching and Adversarial Networks
    Zhong H.
    Wang C.
    Tuo H.
    Hu J.
    Qiao L.
    Jing Z.
    Journal of Shanghai Jiaotong University (Science), 2019, 24 (06) : 699 - 705
  • [23] Learning Social Image Embedding with Deep Multimodal Attention Networks
    Huang, Feiran
    Zhang, Xiaoming
    Li, Zhoujun
    Mei, Tao
    He, Yueying
    Zhao, Zhonghua
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 460 - 468
  • [24] Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition
    Deng, James J.
    Leung, Clement H. C.
    BRAIN INFORMATICS, BI 2021, 2021, 12960 : 179 - 188
  • [25] Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion
    Mai, Sijie
    Hu, Haifeng
    Xing, Songlong
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 164 - 172
  • [26] Classification and Representation Joint Learning via Deep Networks
    Li, Ya
    Tian, Xinmei
    Shen, Xu
    Tao, Dacheng
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2215 - 2221
  • [27] Network Representation Learning Framework Based on Adversarial Graph Convolutional Networks
    Chen M.
    Liu Y.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (11): : 1042 - 1050
  • [28] Multimodal Representation Learning and Set Attention for LWIR In-Scene Atmospheric Compensation
    Westing, Nicholas
    Gross, Kevin C.
    Borghetti, Brett J.
    Kabban, Christine M. Schubert
    Martin, Jacob
    Meola, Joseph
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 127 - 140
  • [29] Multimodal Joint Representation for User Interest Analysis on Content Curation Social Networks
    Wu, Lifang
    Zhang, Dai
    Jian, Meng
    Yang, Bowen
    Liu, Haiying
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 363 - 374
  • [30] JRA-Net: Joint representation attention network for correspondence learning
    Shi, Ziwei
    Xiao, Guobao
    Zheng, Linxin
    Ma, Jiayi
    Chen, Riqing
    PATTERN RECOGNITION, 2023, 135