A Neuro-Symbolic AI System for Visual Question Answering in Pedestrian Video Sequences

被引：1

作者：

Park, Jaeil ^{[1
]}

Bu, Seok-Jun ^{[1
]}

Cho, Sung-Bac ^{[1
]}

机构：

[1] Yonsei Univ, Dept Comp Sci, Seoul, South Korea

来源：

HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022 | 2022年 / 13469卷

关键词：

Visual question-answering; Neuro-symbolic reasoning; Scene graph; Pedestrian video;

D O I：

10.1007/978-3-031-15471-3_38

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid increase in the amount of video data, efficient object recognition is mandatory for a system capable of automatically performing question and answering. In particular, real-world video environments with numerous types of objects and complex relationships require extensive knowledge representation and inference algorithms with the properties and relations of objects. In this paper, we propose a hybrid neuro-symbolic AI system that handles scene-graph of real-world video data. The method combines neural networks that generate scene graphs in consideration of the relationship between objects on real roads and symbol-based inference algorithms for responding to questions. We define object properties, relationships, and question coverage to cover the real-world objects in pedestrian video and traverse a scene-graph to perform complex visual question-answering. We have demonstrated the superiority of the proposed method by confirming that it answered with 99.71% accuracy to 5-types of questions in a pedestrian video environment.

引用

页码：443 / 454

页数：12

共 21 条

[1] Amizadeh S., 2020, VIRTUAL EVENT, V119, P279
[2] VQA: Visual Question Answering
Antol, Stanislaw
Agrawal, Aishwarya
Lu, Jiasen
Mitchell, Margaret
Batra, Dhruv
Zitnick, C. Lawrence
Parikh, Devi
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
[3] Cong WL, 2018, Arxiv, DOI arXiv:1811.08075
[4] Goller C, 1996, IEEE IJCNN, P347, DOI 10.1109/ICNN.1996.548916
[5] Gori M, 2005, IEEE IJCNN, P729
[6] Han C, 2019, ADV NEUR IN, V32
[7] Learning to Reason: End-to-End Module Networks for Visual Question Answering
Hu, Ronghang
Andreas, Jacob
Rohrbach, Marcus
Darrell, Trevor
Saenko, Kate
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 804 - 813
[8] Detecting Visual Relationships Using Box Attention
Kolesnikov, Alexander
Kuznetsova, Alina
Lampert, Christoph H.
Ferrari, Vittorio
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1749 - 1753
[9] Kyoung-Won Park, 2021, Hybrid Artificial Intelligent Systems: 16th International Conference, HAIS 2021, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (12886), P88, DOI 10.1007/978-3-030-86271-8_8
[10] Factorizable Net: An Efficient Subgraph-Based Framework for Scene Graph Generation
Li, Yikang
Ouyang, Wanli
Zhou, Bolei
Shi, Jianping
Zhang, Chao
Wang, Xiaogang
[J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 346 - 363

← 1 2 3 →