A Neuro-Symbolic AI System for Visual Question Answering in Pedestrian Video Sequences

被引:1
作者
Park, Jaeil [1 ]
Bu, Seok-Jun [1 ]
Cho, Sung-Bac [1 ]
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul, South Korea
来源
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022 | 2022年 / 13469卷
关键词
Visual question-answering; Neuro-symbolic reasoning; Scene graph; Pedestrian video;
D O I
10.1007/978-3-031-15471-3_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid increase in the amount of video data, efficient object recognition is mandatory for a system capable of automatically performing question and answering. In particular, real-world video environments with numerous types of objects and complex relationships require extensive knowledge representation and inference algorithms with the properties and relations of objects. In this paper, we propose a hybrid neuro-symbolic AI system that handles scene-graph of real-world video data. The method combines neural networks that generate scene graphs in consideration of the relationship between objects on real roads and symbol-based inference algorithms for responding to questions. We define object properties, relationships, and question coverage to cover the real-world objects in pedestrian video and traverse a scene-graph to perform complex visual question-answering. We have demonstrated the superiority of the proposed method by confirming that it answered with 99.71% accuracy to 5-types of questions in a pedestrian video environment.
引用
收藏
页码:443 / 454
页数:12
相关论文
共 21 条
  • [1] Amizadeh S., 2020, VIRTUAL EVENT, V119, P279
  • [2] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [3] Cong WL, 2018, Arxiv, DOI arXiv:1811.08075
  • [4] Goller C, 1996, IEEE IJCNN, P347, DOI 10.1109/ICNN.1996.548916
  • [5] Gori M, 2005, IEEE IJCNN, P729
  • [6] Han C, 2019, ADV NEUR IN, V32
  • [7] Learning to Reason: End-to-End Module Networks for Visual Question Answering
    Hu, Ronghang
    Andreas, Jacob
    Rohrbach, Marcus
    Darrell, Trevor
    Saenko, Kate
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 804 - 813
  • [8] Detecting Visual Relationships Using Box Attention
    Kolesnikov, Alexander
    Kuznetsova, Alina
    Lampert, Christoph H.
    Ferrari, Vittorio
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1749 - 1753
  • [9] Kyoung-Won Park, 2021, Hybrid Artificial Intelligent Systems: 16th International Conference, HAIS 2021, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (12886), P88, DOI 10.1007/978-3-030-86271-8_8
  • [10] Factorizable Net: An Efficient Subgraph-Based Framework for Scene Graph Generation
    Li, Yikang
    Ouyang, Wanli
    Zhou, Bolei
    Shi, Jianping
    Zhang, Chao
    Wang, Xiaogang
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 346 - 363