Learning Joint Multimodal Representation with Adversarial Attention Networks

被引：17

作者：

Huang, Feiran ^{[1
]}

Zhang, Xiaoming ^{[2
]}

Li, Zhoujun ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China

[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

multimodal; representation learning; adversarial networks; attention model; siamese learning;

D O I：

10.1145/3240508.3240614

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, learning a joint representation for the multimodal data (e.g., containing both visual content and text description) has attracted extensive research interests. Usually, the features of different modalities are correlational and compositive, and thus a joint representation capturing the correlation is more effective than a subset of the features. Most of existing multimodal representation learning methods suffer from lack of additional constraints to enhance the robustness of the learned representations. In this paper, a novel Adversarial Attention Networks (AAN) is proposed to incorporate both the attention mechanism and the adversarial networks for effective and robust multimodal representation learning. Specifically, a visual-semantic attention model with siamese learning strategy is proposed to encode the fine-grained correlation between visual and textual modalities. Meanwhile, the adversarial learning model is employed to regularize the generated representation by matching the posterior distribution of the representation to the given priors. Then, the two modules are incorporated into a integrated learning framework to learn the joint multimodal representation. Experimental results in two tasks, i.e., multi-label classification and tag recommendation, show that the proposed model outperforms state-of-the-art representation learning methods.

引用

页码：1874 / 1882

页数：9

共 50 条

[31] Joint Image-text Representation Learning for Fashion Retrieval
Yan, Cairong
Li, Yu
Wan, Yongquan
Zhang, Zhaohui
ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 412 - 417
[32] Generating Adversarial Examples by Adversarial Networks for Semi-supervised Learning
Ma, Yun
Mao, Xudong
Chen, Yangbin
Li, Qing
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2019, 2019, 11881 : 115 - 129
[33] On Representation Learning for Road Networks
Wang, Meng-Xiang
Lee, Wang-Chien
Fu, Tao-Yang
Yu, Ge
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (01)
[34] CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Yu, Licheng
Chen, Jun
Sinha, Animesh
Wang, Mengjiao
Chen, Yu
Berg, Tamara L.
Zhang, Ning
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4433 - 4442
[35] DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention Networks
Sankar, Aravind
Wu, Yanhong
Gou, Liang
Zhang, Wei
Yang, Hao
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 519 - 527
[36] A student performance prediction model based on multimodal generative adversarial networks
Liu, Junjie
Yang, Yong
INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2025, 47 (03) : 186 - 198
[37] Multimodal Fusion Representation Learning Based on Differential Privacy
Cai, Chaoxin
Sang, Yingpeng
Huang, Jinghao
Zhang, Maliang
Li, Weizheng
PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 548 - 559
[38] Multimodal Cardiac Segmentation Using Disentangled Representation Learning
Chartsias, Agisilaos
Papanastasiou, Giorgos
Wang, Chengjia
Stirrat, Colin
Semple, Scott
Newby, David
Dharmakumar, Rohan
Tsaftaris, Sotirios A.
STATISTICAL ATLASES AND COMPUTATIONAL MODELS OF THE HEART: MULTI-SEQUENCE CMR SEGMENTATION, CRT-EPIGGY AND LV FULL QUANTIFICATION CHALLENGES, 2020, 12009 : 128 - 137
[39] Triple disentangled representation learning for multimodal affective analysis
Zhou, Ying
Liang, Xuefeng
Chen, Han
Zhao, Yin
Chen, Xin
Yu, Lida
INFORMATION FUSION, 2024, 114
[40] Feature Equilibrium: An Adversarial Training Method to Improve Representation Learning
Liu, Minghui
Yang, Meiyi
Deng, Jiali
Cheng, Xuan
Xie, Tianshu
Deng, Pan
Gong, Haigang
Liu, Ming
Wang, Xiaomin
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)

← 1 2 3 4 5 →