Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引：0

作者：

Alizadeh, Mehrdad ^{[1
]}

Di Eugenio, Barbara ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA

来源：

INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING | 2020年 / 14卷 / 02期

关键词：

Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;

D O I：

10.1142/S1793351X20400085

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.

引用

页码：223 / 248

页数：26

共 50 条

[21] Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
Mateusz Malinowski
Marcus Rohrbach
Mario Fritz
International Journal of Computer Vision, 2017, 125 : 110 - 135
[22] Robust data augmentation and contrast learning for debiased visual question answering
Ning, Ke
Li, Zhixin
NEUROCOMPUTING, 2025, 626
[23] Adversarial Learning with Bidirectional Attention for Visual Question Answering
Li, Qifeng
Tang, Xinyi
Jian, Yi
SENSORS, 2021, 21 (21)
[24] Learning Visual Question Answering by Bootstrapping Hard Attention
Malinowski, Mateusz
Doersch, Carl
Santoro, Adam
Battaglia, Peter
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 3 - 20
[25] Semantic-Aware Modular Capsule Routing for Visual Question Answering
Han, Yudong
Yin, Jianhua
Wu, Jianlong
Wei, Yinwei
Nie, Liqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5537 - 5549
[26] Focal and Composed Vision-semantic Modeling for Visual Question Answering
Han, Yudong
Guo, Yangyang
Yin, Jianhua
Liu, Meng
Hu, Yupeng
Nie, Liqiang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4528 - 4536
[27] Combining Deep Learning with Information Retrieval for Question Answering
Yang, Fengyu
Gan, Liang
Li, Aiping
Huang, Dongchuan
Chou, Xiaohui
Liu, Hongmei
NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 917 - 925
[28] Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
Zheng, Yuhang
Wang, Zhen
Chen, Long
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1084 - 1088
[29] Learning a Mixture of Conditional Gating Blocks for Visual Question Answering
Sun, Qiang
Fu, Yan-Wei
Xue, Xiang-Yang
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 912 - 928
[30] Explicit ensemble attention learning for improving visual question answering
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
PATTERN RECOGNITION LETTERS, 2018, 111 : 51 - 57

← 1 2 3 4 5 →