Learning Joint Multimodal Representation with Adversarial Attention Networks

被引：17

作者：

Huang, Feiran ^{[1
]}

Zhang, Xiaoming ^{[2
]}

Li, Zhoujun ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China

[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

multimodal; representation learning; adversarial networks; attention model; siamese learning;

D O I：

10.1145/3240508.3240614

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, learning a joint representation for the multimodal data (e.g., containing both visual content and text description) has attracted extensive research interests. Usually, the features of different modalities are correlational and compositive, and thus a joint representation capturing the correlation is more effective than a subset of the features. Most of existing multimodal representation learning methods suffer from lack of additional constraints to enhance the robustness of the learned representations. In this paper, a novel Adversarial Attention Networks (AAN) is proposed to incorporate both the attention mechanism and the adversarial networks for effective and robust multimodal representation learning. Specifically, a visual-semantic attention model with siamese learning strategy is proposed to encode the fine-grained correlation between visual and textual modalities. Meanwhile, the adversarial learning model is employed to regularize the generated representation by matching the posterior distribution of the representation to the given priors. Then, the two modules are incorporated into a integrated learning framework to learn the joint multimodal representation. Experimental results in two tasks, i.e., multi-label classification and tag recommendation, show that the proposed model outperforms state-of-the-art representation learning methods.

引用

页码：1874 / 1882

页数：9

共 50 条

[21] Zero-Shot Learning with Joint Generative Adversarial Networks
Zhang, Minwan
Wang, Xiaohua
Shi, Yueting
Ren, Shiwei
Wang, Weijiang
ELECTRONICS, 2023, 12 (10)
[22] Transfer Learning Based on Joint Feature Matching and Adversarial Networks
Zhong H.
Wang C.
Tuo H.
Hu J.
Qiao L.
Jing Z.
Journal of Shanghai Jiaotong University (Science), 2019, 24 (06) : 699 - 705
[23] Learning Social Image Embedding with Deep Multimodal Attention Networks
Huang, Feiran
Zhang, Xiaoming
Li, Zhoujun
Mei, Tao
He, Yueying
Zhao, Zhonghua
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 460 - 468
[24] Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition
Deng, James J.
Leung, Clement H. C.
BRAIN INFORMATICS, BI 2021, 2021, 12960 : 179 - 188
[25] Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion
Mai, Sijie
Hu, Haifeng
Xing, Songlong
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 164 - 172
[26] Classification and Representation Joint Learning via Deep Networks
Li, Ya
Tian, Xinmei
Shen, Xu
Tao, Dacheng
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2215 - 2221
[27] Network Representation Learning Framework Based on Adversarial Graph Convolutional Networks
Chen M.
Liu Y.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (11): : 1042 - 1050
[28] Multimodal Representation Learning and Set Attention for LWIR In-Scene Atmospheric Compensation
Westing, Nicholas
Gross, Kevin C.
Borghetti, Brett J.
Kabban, Christine M. Schubert
Martin, Jacob
Meola, Joseph
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 127 - 140
[29] Multimodal Joint Representation for User Interest Analysis on Content Curation Social Networks
Wu, Lifang
Zhang, Dai
Jian, Meng
Yang, Bowen
Liu, Haiying
PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 363 - 374
[30] JRA-Net: Joint representation attention network for correspondence learning
Shi, Ziwei
Xiao, Guobao
Zheng, Linxin
Ma, Jiayi
Chen, Riqing
PATTERN RECOGNITION, 2023, 135

← 1 2 3 4 5 →