Learning Joint Multimodal Representation with Adversarial Attention Networks

被引：17

作者：

Huang, Feiran ^{[1
]}

Zhang, Xiaoming ^{[2
]}

Li, Zhoujun ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China

[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

multimodal; representation learning; adversarial networks; attention model; siamese learning;

D O I：

10.1145/3240508.3240614

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, learning a joint representation for the multimodal data (e.g., containing both visual content and text description) has attracted extensive research interests. Usually, the features of different modalities are correlational and compositive, and thus a joint representation capturing the correlation is more effective than a subset of the features. Most of existing multimodal representation learning methods suffer from lack of additional constraints to enhance the robustness of the learned representations. In this paper, a novel Adversarial Attention Networks (AAN) is proposed to incorporate both the attention mechanism and the adversarial networks for effective and robust multimodal representation learning. Specifically, a visual-semantic attention model with siamese learning strategy is proposed to encode the fine-grained correlation between visual and textual modalities. Meanwhile, the adversarial learning model is employed to regularize the generated representation by matching the posterior distribution of the representation to the given priors. Then, the two modules are incorporated into a integrated learning framework to learn the joint multimodal representation. Experimental results in two tasks, i.e., multi-label classification and tag recommendation, show that the proposed model outperforms state-of-the-art representation learning methods.

引用

页码：1874 / 1882

页数：9

共 50 条

[1] Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks
Huang, Feiran
Jolfaei, Alireza
Bashir, Ali Kashif
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (05) : 856 - 868
[2] Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking
Vukotic, Vedran
Raymond, Christian
Gravier, Guillaume
PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 421 - 424
[3] Multimodal adversarial representation learning for breast cancer prognosis prediction
Du, Xiuquan
Zhao, Yuefan
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
[4] Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
Vukotic, Vedran
Raymond, Christian
Gravier, Guillaume
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 343 - 346
[5] Learning Joint Multimodal Representation Based on Multi-fusion Deep Neural Networks
Gu, Zepeng
Lang, Bo
Yue, Tongyu
Huang, Lei
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 276 - 285
[6] Multimodal attention for lip synthesis using conditional generative adversarial networks
Vidal, Andrea
Busso, Carlos
SPEECH COMMUNICATION, 2023, 153
[7] Adversarial Multimodal Representation Learning for Click-Through Rate Prediction
Li, Xiang
Wang, Chao
Tan, Jiwei
Zeng, Xiaoyi
Ou, Dan
Zheng, Bo
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 827 - 836
[8] Neighborhood Attention Networks With Adversarial Learning for Link Prediction
Wang, Zhitao
Lei, Yu
Li, Wenjie
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3653 - 3663
[9] SAR IMAGE REPRESENTATION LEARNING WITH ADVERSARIAL AUTOENCODER NETWORKS
Song, Qian
Xu, Feng
Jin, Ya-Qiu
2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 9498 - 9501
[10] Learning Graph Topology Representation with Attention Networks
Qi, Yuanyuan
Zhang, Jiayue
Xu, Weiran
Guo, Jun
Zhang, Honggang
2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 1 - 4

← 1 2 3 4 5 →