Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引:0
|
作者
Alizadeh, Mehrdad [1 ]
Di Eugenio, Barbara [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;
D O I
10.1142/S1793351X20400085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.
引用
收藏
页码:223 / 248
页数:26
相关论文
共 50 条
  • [1] Multitask Learning for Visual Question Answering
    Ma, Jie
    Liu, Jun
    Lin, Qika
    Wu, Bei
    Wang, Yaxian
    You, Yang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
  • [2] A Corpus for Visual Question Answering Annotated with Frame Semantic Information
    Alizadeh, Mehrdad
    Di Eugenio, Barbara
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5524 - 5531
  • [3] Multitask learning for neural generative question answering
    Yanzhou Huang
    Tao Zhong
    Machine Vision and Applications, 2018, 29 : 1009 - 1017
  • [4] Multitask learning for neural generative question answering
    Huang, Yanzhou
    Zhong, Tao
    MACHINE VISION AND APPLICATIONS, 2018, 29 (06) : 1009 - 1017
  • [5] Learning visual question answering on controlled semantic noisy labels
    Zhang, Haonan
    Zeng, Pengpeng
    Hu, Yuxuan
    Qian, Jin
    Song, Jingkuan
    Gao, Lianli
    PATTERN RECOGNITION, 2023, 138
  • [6] Semantic Text Recognition via Visual Question Answering
    Beltran, Viviana
    Journet, Nicholas
    Coustaty, Mickael
    Doucet, Antoine
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 97 - 102
  • [7] STRUCTURED SEMANTIC REPRESENTATION FOR VISUAL QUESTION ANSWERING
    Yu, Dongchen
    Gao, Xing
    Xiong, Hongkai
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2286 - 2290
  • [8] REINFORCEMENT STACKED LEARNING WITH SEMANTIC-ASSOCIATED ATTENTION FOR VISUAL QUESTION ANSWERING
    Xiao, Xinyu
    Zhang, Chunxia
    Xiang, Shiming
    Pan, Chunhong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4170 - 4174
  • [9] R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
    Lu, Pan
    Ji, Lei
    Zhang, Wei
    Duan, Nan
    Zhou, Ming
    Wang, Jianyong
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1880 - 1889
  • [10] Robust visual question answering via semantic cross modal augmentation
    Mashrur, Akib
    Luo, Wei
    Zaidi, Nayyar A.
    Robles-Kelly, Antonio
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238