Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引：0

作者：

Alizadeh, Mehrdad ^{[1
]}

Di Eugenio, Barbara ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA

来源：

INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING | 2020年 / 14卷 / 02期

关键词：

Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;

D O I：

10.1142/S1793351X20400085

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.

引用

页码：223 / 248

页数：26

共 50 条

[1] Multitask Learning for Visual Question Answering
Ma, Jie
Liu, Jun
Lin, Qika
Wu, Bei
Wang, Yaxian
You, Yang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
[2] A Corpus for Visual Question Answering Annotated with Frame Semantic Information
Alizadeh, Mehrdad
Di Eugenio, Barbara
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5524 - 5531
[3] Multitask learning for neural generative question answering
Yanzhou Huang
Tao Zhong
Machine Vision and Applications, 2018, 29 : 1009 - 1017
[4] Multitask learning for neural generative question answering
Huang, Yanzhou
Zhong, Tao
MACHINE VISION AND APPLICATIONS, 2018, 29 (06) : 1009 - 1017
[5] Learning visual question answering on controlled semantic noisy labels
Zhang, Haonan
Zeng, Pengpeng
Hu, Yuxuan
Qian, Jin
Song, Jingkuan
Gao, Lianli
PATTERN RECOGNITION, 2023, 138
[6] Semantic Text Recognition via Visual Question Answering
Beltran, Viviana
Journet, Nicholas
Coustaty, Mickael
Doucet, Antoine
2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 97 - 102
[7] STRUCTURED SEMANTIC REPRESENTATION FOR VISUAL QUESTION ANSWERING
Yu, Dongchen
Gao, Xing
Xiong, Hongkai
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2286 - 2290
[8] REINFORCEMENT STACKED LEARNING WITH SEMANTIC-ASSOCIATED ATTENTION FOR VISUAL QUESTION ANSWERING
Xiao, Xinyu
Zhang, Chunxia
Xiang, Shiming
Pan, Chunhong
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4170 - 4174
[9] R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
Lu, Pan
Ji, Lei
Zhang, Wei
Duan, Nan
Zhou, Ming
Wang, Jianyong
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1880 - 1889
[10] Robust visual question answering via semantic cross modal augmentation
Mashrur, Akib
Luo, Wei
Zaidi, Nayyar A.
Robles-Kelly, Antonio
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238

← 1 2 3 4 5 →