Adversarial Multimodal Network for Movie Story Question Answering

被引:15
|
作者
Yuan, Zhaoquan [1 ]
Sun, Siyuan [2 ,3 ]
Duan, Lixin [2 ,3 ]
Li, Changsheng [4 ]
Wu, Xiao [1 ]
Xu, Changsheng [5 ]
机构
[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 610031, Peoples R China
[2] Univ Elect Sci & Technol China, Big Data Res Ctr, Chengdu 610051, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610051, Peoples R China
[4] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
[5] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge discovery; Motion pictures; Visualization; Task analysis; Generators; Gallium nitride; Natural languages; Movie question answering; adversarial network; multimodal understanding;
D O I
10.1109/TMM.2020.3002667
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual question answering by using information from multiple modalities has attracted more and more attention in recent years. However, it is a very challenging task, as the visual content and natural language have quite different statistical properties. In this work, we present a method called Adversarial Multimodal Network (AMN) to better understand video stories for question answering. In AMN, we propose to learn multimodal feature representations by finding a more coherent subspace for video clips and the corresponding texts (e.g., subtitles and questions) based on generative adversarial networks. Moreover, a self-attention mechanism is developed to enforce our newly introduced consistency constraint in order to preserve the self-correlation between the visual cues of the original video clips in the learned multimodal representations. Extensive experiments on the benchmark MovieQA and TVQA datasets show the effectiveness of our proposed AMN over other published state-of-the-art methods.
引用
收藏
页码:1744 / 1756
页数:13
相关论文
共 50 条
  • [1] Progressive Attention Memory Network for Movie Story Question Answering
    Kim, Junyeong
    Ma, Minuk
    Kim, Kyungsu
    Kim, Sungjin
    Yoo, Chang D.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8329 - 8338
  • [2] Multimodal Dual Attention Memory for Video Story Question Answering
    Kim, Kyung-Min
    Choi, Seong-Ho
    Kim, Jin-Hwa
    Zhang, Byoung-Tak
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 698 - 713
  • [3] From text to multimodal: a survey of adversarial example generation in question answering systems
    Yigit, Gulsum
    Amasyali, Mehmet Fatih
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (12) : 7165 - 7204
  • [4] Holistic Multi-Modal Memory Network for Movie Question Answering
    Wang, Anran
    Anh Tuan Luu
    Foo, Chuan-Sheng
    Zhu, Hongyuan
    Tay, Yi
    Chandrasekhar, Vijay
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 489 - 499
  • [5] A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering
    Guo, Zhicheng
    Zhao, Jiaxuan
    Jiao, Licheng
    Liu, Xu
    Liu, Fang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 38 - 49
  • [6] Improving Visual Question Answering by Multimodal Gate Fusion Network
    Xiang, Shenxiang
    Chen, Qiaohong
    Fang, Xian
    Guo, Menghao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] Multimodal Graph Transformer for Multimodal Question Answering
    He, Xuehai
    Wang, Xin Eric
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 189 - 200
  • [8] Multimodal Graph Transformer for Multimodal Question Answering
    He, Xuehai
    Wang, Xin Eric
    EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2023, : 189 - 200
  • [9] Multimodal Graph Transformer for Multimodal Question Answering
    He, Xuehai
    Wang, Xin Eric
    arXiv, 2023,
  • [10] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792