Visual Experience-Based Question Answering with Complex Multimodal Environments

被引:0
|
作者
Kim, Incheol [1 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Suwon 16227, South Korea
关键词
D O I
10.1155/2020/8567271
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This paper proposes a novel visual experience-based question answering problem (VEQA) and the corresponding dataset for embodied intelligence research that requires an agent to do actions, understand 3D scenes from successive partial input images, and answer natural language questions about its visual experiences in real time. Unlike the conventional visual question answering (VQA), the VEQA problem assumes both partial observability and dynamics of a complex multimodal environment. To address this VEQA problem, we propose a hybrid visual question answering system, VQAS, integrating a deep neural network-based scene graph generation model and a rule-based knowledge reasoning system. The proposed system can generate more accurate scene graphs for dynamic environments with some uncertainty. Moreover, it can answer complex questions through knowledge reasoning with rich background knowledge. Results of experiments using a photo-realistic 3D simulated environment, AI2-THOR, and the VEQA benchmark dataset prove the high performance of the proposed system.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Multimodal Graph Transformer for Multimodal Question Answering
    He, Xuehai
    Wang, Xin Eric
    arXiv, 2023,
  • [42] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [43] Visual Question Answering Based on Position Alignment
    Xia, Qihao
    Yu, Chao
    Peng, Pingping
    Gu, Henghao
    Zheng, Zhengqi
    Zhao, Kun
    2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [44] Visual Question Answering based on Formal Logic
    Sethuraman, Muralikrishnna G.
    Payani, Ali
    Fekri, Faramarz
    Kerce, J. Clayton
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 952 - 957
  • [45] Experience-Based Routing in Call Center Environments
    Robbins, Thomas R.
    SERVICE SCIENCE, 2015, 7 (02) : 132 - 148
  • [46] Experience-based productivity improvements in project environments
    Boone, T
    NEW DIRECTIONS IN SUPPLY-CHAIN MANAGEMENT: TECHNOLOGY, STRATEGY, AND IMPLEMENTATION, 2002, : 272 - 281
  • [47] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [48] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
    Zhang, Heng
    Wei, Zhihua
    Liu, Guanming
    Wang, Rui
    Mu, Ruibin
    Liu, Chuanbao
    Yuan, Aiquan
    Cao, Guodong
    Hu, Ning
    Virtual Reality and Intelligent Hardware, 6 (04): : 280 - 291
  • [49] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
    Heng ZHANG
    Zhihua WEI
    Guanming LIU
    Rui WANG
    Ruibin MU
    Chuanbao LIU
    Aiquan YUAN
    Guodong CAO
    Ning HU
    虚拟现实与智能硬件(中英文), 2024, 6 (04) : 280 - 291
  • [50] Visual question answering model based on visual relationship detection
    Xi, Yuling
    Zhang, Yanning
    Ding, Songtao
    Wan, Shaohua
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80