Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

被引:7
|
作者
Vukotic, Vedran [1 ]
Raymond, Christian [1 ]
Gravier, Guillaume [2 ]
机构
[1] INRIA IRISA, INSA Rennes, Rennes, France
[2] INRIA IRISA, CNRS, Rennes, France
来源
PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17) | 2017年
关键词
video hyperlinking; multimedia retrieval; multimodal embedding; multimodal autoencoders; representation learning; unsupervised learning; generative adversarial networks; neural networks;
D O I
10.1145/3078971.3079038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the state of the art is a variation of two interlocked networks working in opposing directions. These systems provide good multimodal embeddings and are also capable of translating from one representation space to the other. Operating on representation spaces, these networks lack the ability to operate in the original spaces (text or image), which makes it difficult to visualize the crossmodal function, and do not generalize well to unseen data. Recently, generative adversarial networks have gained popularity and have been used for generating realistic synthetic data and for obtaining high-level, single-modal latent representation spaces. In this work, we evaluate the feasibility of using GANs to obtain multimodal representations. We show that GANs can be used for multimodal representation learning and that they provide multimodal representations that are superior to representations obtained with multimodal autoencoders. Additionally, we illustrate the ability of visualizing crossmodal translations that can provide human-interpretable insights on learned GAN-based video hyperlinking models.
引用
收藏
页码:421 / 424
页数:4
相关论文
共 50 条
  • [1] Learning Joint Multimodal Representation with Adversarial Attention Networks
    Huang, Feiran
    Zhang, Xiaoming
    Li, Zhoujun
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1874 - 1882
  • [2] Frequency-Based Motion Representation for Video Generative Adversarial Networks
    Hyun, Sangeek
    Lew, Jaihyun
    Chung, Jiwoo
    Kim, Euiyeon
    Heo, Jae-Pil
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3949 - 3963
  • [3] GraphWGAN: Graph Representation Learning with Wasserstein Generative Adversarial Networks
    Yan, Rong
    Shen, Huawei
    Cao, Qi
    Cen, Keting
    Wang, Li
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 315 - 322
  • [4] Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks
    Huang, Feiran
    Jolfaei, Alireza
    Bashir, Ali Kashif
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (05) : 856 - 868
  • [5] Video Generative Adversarial Networks: A Review
    Aldausari, Nuha
    Sowmya, Arcot
    Marcus, Nadine
    Mohammadi, Gelareh
    ACM COMPUTING SURVEYS, 2023, 55 (02)
  • [6] Orthogonal Subspace Representation for Generative Adversarial Networks
    Jiang, Hongxiang
    Luo, Xiaoyan
    Yin, Jihao
    Fu, Huazhu
    Wang, Fuxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [7] Orthogonal Subspace Representation for Generative Adversarial Networks
    Jiang, Hongxiang
    Luo, Xiaoyan
    Yin, Jihao
    Fu, Huazhu
    Wang, Fuxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 4413 - 4427
  • [8] WalkGAN: Network Representation Learning With Sequence-Based Generative Adversarial Networks
    Jin, Taisong
    Yang, Xixi
    Yu, Zhengtao
    Luo, Han
    Zhang, Yongmei
    Jie, Feiran
    Zeng, Xiangxiang
    Jiang, Min
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5684 - 5694
  • [9] Conditional multichannel generative adversarial networks with an application to traffic signs representation learning
    Ghorban, Farzin
    Milani, Narges
    Schugk, Daniel
    Roese-Koerner, Lutz
    Su, Yu
    Mueller, Dennis
    Kummert, Anton
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2019, 8 (01) : 73 - 82
  • [10] Conditional multichannel generative adversarial networks with an application to traffic signs representation learning
    Farzin Ghorban
    Narges Milani
    Daniel Schugk
    Lutz Roese-Koerner
    Yu Su
    Dennis Müller
    Anton Kummert
    Progress in Artificial Intelligence, 2019, 8 : 73 - 82