Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

被引：7

作者：

Vukotic, Vedran ^{[1
]}

Raymond, Christian ^{[1
]}

Gravier, Guillaume ^{[2
]}

机构：

[1] INRIA IRISA, INSA Rennes, Rennes, France

[2] INRIA IRISA, CNRS, Rennes, France

来源：

PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17) | 2017年

关键词：

video hyperlinking; multimedia retrieval; multimodal embedding; multimodal autoencoders; representation learning; unsupervised learning; generative adversarial networks; neural networks;

D O I：

10.1145/3078971.3079038

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the state of the art is a variation of two interlocked networks working in opposing directions. These systems provide good multimodal embeddings and are also capable of translating from one representation space to the other. Operating on representation spaces, these networks lack the ability to operate in the original spaces (text or image), which makes it difficult to visualize the crossmodal function, and do not generalize well to unseen data. Recently, generative adversarial networks have gained popularity and have been used for generating realistic synthetic data and for obtaining high-level, single-modal latent representation spaces. In this work, we evaluate the feasibility of using GANs to obtain multimodal representations. We show that GANs can be used for multimodal representation learning and that they provide multimodal representations that are superior to representations obtained with multimodal autoencoders. Additionally, we illustrate the ability of visualizing crossmodal translations that can provide human-interpretable insights on learned GAN-based video hyperlinking models.

引用

页码：421 / 424

页数：4

共 50 条

[1] Learning Joint Multimodal Representation with Adversarial Attention Networks
Huang, Feiran
Zhang, Xiaoming
Li, Zhoujun
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1874 - 1882
[2] Frequency-Based Motion Representation for Video Generative Adversarial Networks
Hyun, Sangeek
Lew, Jaihyun
Chung, Jiwoo
Kim, Euiyeon
Heo, Jae-Pil
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3949 - 3963
[3] GraphWGAN: Graph Representation Learning with Wasserstein Generative Adversarial Networks
Yan, Rong
Shen, Huawei
Cao, Qi
Cen, Keting
Wang, Li
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 315 - 322
[4] Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks
Huang, Feiran
Jolfaei, Alireza
Bashir, Ali Kashif
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (05) : 856 - 868
[5] Video Generative Adversarial Networks: A Review
Aldausari, Nuha
Sowmya, Arcot
Marcus, Nadine
Mohammadi, Gelareh
ACM COMPUTING SURVEYS, 2023, 55 (02)
[6] Orthogonal Subspace Representation for Generative Adversarial Networks
Jiang, Hongxiang
Luo, Xiaoyan
Yin, Jihao
Fu, Huazhu
Wang, Fuxiang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
[7] Orthogonal Subspace Representation for Generative Adversarial Networks
Jiang, Hongxiang
Luo, Xiaoyan
Yin, Jihao
Fu, Huazhu
Wang, Fuxiang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 4413 - 4427
[8] WalkGAN: Network Representation Learning With Sequence-Based Generative Adversarial Networks
Jin, Taisong
Yang, Xixi
Yu, Zhengtao
Luo, Han
Zhang, Yongmei
Jie, Feiran
Zeng, Xiangxiang
Jiang, Min
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5684 - 5694
[9] Conditional multichannel generative adversarial networks with an application to traffic signs representation learning
Ghorban, Farzin
Milani, Narges
Schugk, Daniel
Roese-Koerner, Lutz
Su, Yu
Mueller, Dennis
Kummert, Anton
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2019, 8 (01) : 73 - 82
[10] Conditional multichannel generative adversarial networks with an application to traffic signs representation learning
Farzin Ghorban
Narges Milani
Daniel Schugk
Lutz Roese-Koerner
Yu Su
Dennis Müller
Anton Kummert
Progress in Artificial Intelligence, 2019, 8 : 73 - 82

← 1 2 3 4 5 →