Visual Experience-Based Question Answering with Complex Multimodal Environments

被引：0

作者：

Kim, Incheol ^{[1
]}

机构：

[1] Kyonggi Univ, Dept Comp Sci, Suwon 16227, South Korea

来源：

MATHEMATICAL PROBLEMS IN ENGINEERING | 2020年 / 2020卷 / 2020期

关键词：

D O I：

10.1155/2020/8567271

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

This paper proposes a novel visual experience-based question answering problem (VEQA) and the corresponding dataset for embodied intelligence research that requires an agent to do actions, understand 3D scenes from successive partial input images, and answer natural language questions about its visual experiences in real time. Unlike the conventional visual question answering (VQA), the VEQA problem assumes both partial observability and dynamics of a complex multimodal environment. To address this VEQA problem, we propose a hybrid visual question answering system, VQAS, integrating a deep neural network-based scene graph generation model and a rule-based knowledge reasoning system. The proposed system can generate more accurate scene graphs for dynamic environments with some uncertainty. Moreover, it can answer complex questions through knowledge reasoning with rich background knowledge. Results of experiments using a photo-realistic 3D simulated environment, AI2-THOR, and the VEQA benchmark dataset prove the high performance of the proposed system.

引用

页数：18

共 50 条

[41] Multimodal Graph Transformer for Multimodal Question Answering
He, Xuehai
Wang, Xin Eric
arXiv, 2023,
[42] Visual Question Answering
Nada, Ahmed
Chen, Min
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
[43] Visual Question Answering Based on Position Alignment
Xia, Qihao
Yu, Chao
Peng, Pingping
Gu, Henghao
Zheng, Zhengqi
Zhao, Kun
2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
[44] Visual Question Answering based on Formal Logic
Sethuraman, Muralikrishnna G.
Payani, Ali
Fekri, Faramarz
Kerce, J. Clayton
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 952 - 957
[45] Experience-Based Routing in Call Center Environments
Robbins, Thomas R.
SERVICE SCIENCE, 2015, 7 (02) : 132 - 148
[46] Experience-based productivity improvements in project environments
Boone, T
NEW DIRECTIONS IN SUPPLY-CHAIN MANAGEMENT: TECHNOLOGY, STRATEGY, AND IMPLEMENTATION, 2002, : 272 - 281
[47] Question Modifiers in Visual Question Answering
Britton, William
Sarkhel, Somdeb
Venugopal, Deepak
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
[48] MKEAH： Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
Zhang, Heng
Wei, Zhihua
Liu, Guanming
Wang, Rui
Mu, Ruibin
Liu, Chuanbao
Yuan, Aiquan
Cao, Guodong
Hu, Ning
Virtual Reality and Intelligent Hardware, 6 (04): : 280 - 291
[49] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
Heng ZHANG
Zhihua WEI
Guanming LIU
Rui WANG
Ruibin MU
Chuanbao LIU
Aiquan YUAN
Guodong CAO
Ning HU
虚拟现实与智能硬件(中英文), 2024, 6 (04) : 280 - 291
[50] Visual question answering model based on visual relationship detection
Xi, Yuling
Zhang, Yanning
Ding, Songtao
Wan, Shaohua
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80

← 1 2 3 4 5 →