Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引:0
|
作者
Alizadeh, Mehrdad [1 ]
Di Eugenio, Barbara [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;
D O I
10.1142/S1793351X20400085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.
引用
收藏
页码:223 / 248
页数:26
相关论文
共 50 条
  • [31] Visual Question Answering via Combining Inferential Attention and Semantic Space Mapping
    Liu, Yun
    Zhang, Xiaoming
    Huang, Feiran
    Zhou, Zhibo
    Zhao, Zhonghua
    Li, Zhoujun
    KNOWLEDGE-BASED SYSTEMS, 2020, 207
  • [32] Learning to enhance areal video captioning with visual question answering
    Al Mehmadi, Shima M.
    Bazi, Yakoub
    Al Rahhal, Mohamad M.
    Zuair, Mansour
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (18) : 6395 - 6407
  • [33] Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space
    Gao, Difei
    Wang, Ruiping
    Shan, Shiguang
    Chen, Xilin
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 494 - 505
  • [34] Erasing-based Attention Learning for Visual Question Answering
    Liu, Fei
    Liu, Jing
    Hong, Richang
    Lu, Hanqing
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1175 - 1183
  • [35] Simple contrastive learning in a self-supervised manner for robust visual question answering
    Yang, Shuwen
    Xiao, Luwei
    Wu, Xingjiao
    Xu, Junjie
    Wang, Linlin
    He, Liang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [36] Visual question answering in the medical domain based on deep learning approaches: A comprehensive study
    Al-Sadi, Aisha
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Costen, Fumie
    PATTERN RECOGNITION LETTERS, 2021, 150 : 57 - 75
  • [37] Enhancing Visual Question Answering with Prompt-based Learning: A Cross-modal Approach for Deep Semantic Understanding
    Zhu, Shuaiyu
    Peng, Shuo
    Chen, Shengbo
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ALGORITHMS, SOFTWARE ENGINEERING, AND NETWORK SECURITY, ASENS 2024, 2024, : 713 - 717
  • [38] Depth-Aware and Semantic Guided Relational Attention Network for Visual Question Answering
    Liu, Yuhang
    Wei, Wei
    Peng, Daowan
    Mao, Xian-Ling
    He, Zhiyong
    Zhou, Pan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5344 - 5357
  • [39] Seeing and Reasoning: A Simple Deep Learning Approach to Visual Question Answering
    Zakari, Rufai Yusuf
    Owusu, Jim Wilson
    Qin, Ke
    He, Tao
    Luo, Guangchun
    BIG DATA MINING AND ANALYTICS, 2025, 8 (02): : 458 - 478
  • [40] Adversarial Learning of Answer-Related Representation for Visual Question Answering
    Liu, Yun
    Zhang, Xiaoming
    Huang, Feiran
    Li, Zhoujun
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1013 - 1022