Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引：0

作者：

Alizadeh, Mehrdad ^{[1
]}

Di Eugenio, Barbara ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA

来源：

INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING | 2020年 / 14卷 / 02期

关键词：

Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;

D O I：

10.1142/S1793351X20400085

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.

引用

页码：223 / 248

页数：26

共 50 条

[31] Visual Question Answering via Combining Inferential Attention and Semantic Space Mapping
Liu, Yun
Zhang, Xiaoming
Huang, Feiran
Zhou, Zhibo
Zhao, Zhonghua
Li, Zhoujun
KNOWLEDGE-BASED SYSTEMS, 2020, 207
[32] Learning to enhance areal video captioning with visual question answering
Al Mehmadi, Shima M.
Bazi, Yakoub
Al Rahhal, Mohamad M.
Zuair, Mansour
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (18) : 6395 - 6407
[33] Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space
Gao, Difei
Wang, Ruiping
Shan, Shiguang
Chen, Xilin
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 494 - 505
[34] Erasing-based Attention Learning for Visual Question Answering
Liu, Fei
Liu, Jing
Hong, Richang
Lu, Hanqing
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1175 - 1183
[35] Simple contrastive learning in a self-supervised manner for robust visual question answering
Yang, Shuwen
Xiao, Luwei
Wu, Xingjiao
Xu, Junjie
Wang, Linlin
He, Liang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
[36] Visual question answering in the medical domain based on deep learning approaches: A comprehensive study
Al-Sadi, Aisha
Al-Ayyoub, Mahmoud
Jararweh, Yaser
Costen, Fumie
PATTERN RECOGNITION LETTERS, 2021, 150 : 57 - 75
[37] Enhancing Visual Question Answering with Prompt-based Learning: A Cross-modal Approach for Deep Semantic Understanding
Zhu, Shuaiyu
Peng, Shuo
Chen, Shengbo
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ALGORITHMS, SOFTWARE ENGINEERING, AND NETWORK SECURITY, ASENS 2024, 2024, : 713 - 717
[38] Depth-Aware and Semantic Guided Relational Attention Network for Visual Question Answering
Liu, Yuhang
Wei, Wei
Peng, Daowan
Mao, Xian-Ling
He, Zhiyong
Zhou, Pan
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5344 - 5357
[39] Seeing and Reasoning: A Simple Deep Learning Approach to Visual Question Answering
Zakari, Rufai Yusuf
Owusu, Jim Wilson
Qin, Ke
He, Tao
Luo, Guangchun
BIG DATA MINING AND ANALYTICS, 2025, 8 (02): : 458 - 478
[40] Adversarial Learning of Answer-Related Representation for Visual Question Answering
Liu, Yun
Zhang, Xiaoming
Huang, Feiran
Li, Zhoujun
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1013 - 1022

← 1 2 3 4 5 →