Image Understanding using vision and reasoning through Scene Description Graph

被引：45

作者：

Aditya, Somak ^{[1
]}

Yang, Yezhou ^{[1
]}

Baral, Chitta ^{[1
]}

Aloimonos, Yiannis ^{[2
]}

Fermueller, Cornelia ^{[2
]}

机构：

[1] Arizona State Univ, Tempe, AZ 85281 USA

[2] Univ Maryland, College Pk, MD 20742 USA

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2018年 / 173卷

基金：

美国国家科学基金会;

关键词：

Image Understanding; Commonsense Reasoning; Vision; Reasoning; QUESTIONS; KNOWLEDGE; MODELS;

D O I：

10.1016/j.cviu.2017.12.004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two of the fundamental tasks in image understanding using text are caption generation and visual question answering (Antol et al., 2015; Xiong et al., 2016). This work presents an intermediate knowledge structure that can be used for both tasks to obtain increased interpretability. We call this knowledge structure Scene Description Graph (SDG), as it is a directed labeled graph, representing objects, actions, regions, as well as their attributes, along with inferred concepts and semantic (from KM-Ontology (Clark et al., 2004)), ontological (i.e. superclass, hasProperty), and spatial relations. Thereby a general architecture is proposed in which a system can represent both the content and underlying concepts of an image using an SDG. The architecture is implemented using generic visual recognition techniques and commonsense reasoning to extract graphs from images. The utility of the generated SDGs is demonstrated in the applications of image captioning, image retrieval, and through examples in visual question answering. The experiments in this work show that the extracted graphs capture syntactic and semantic content of images with reasonable accuracy.

引用

页码：33 / 45

页数：13

共 50 条

[1] Explanatory Reasoning for Image Understanding Using Formal Concept Analysis and Description Logics
Atif, Jamal
Hudelot, Celine
Bloch, Isabelle
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (05): : 552 - 570
[2] Graph neural networks in vision-language image understanding: a survey
Senior, Henry
Slabaugh, Gregory
Yuan, Shanxin
Rossi, Luca
VISUAL COMPUTER, 2025, 41 (01): : 491 - 516
[3] Cooperative spatial reasoning for image understanding
Matsuyama, T
Wada, T
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 1997, 11 (01) : 205 - 227
[4] Explaining Newton's laws of motion: using student reasoning through representations to develop conceptual understanding
Waldrip, Bruce
Prain, Vaughan
Sellings, Peter
INSTRUCTIONAL SCIENCE, 2013, 41 (01) : 165 - 189
[5] A Vision Enriched Intelligent Agent with Image Description Generation
Zhang, Li
Fielding, Ben
Kinghorn, Philip
Mistry, Kamlesh
AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1488 - 1490
[6] Reasoning on objects and grasping using description logics
Vitucci, Nicola
Gini, Giuseppina
ADVANCED ROBOTICS, 2019, 33 (13) : 616 - 635
[7] Reasoning in Fuzzy Description Logics using Automata
Borgwardt, Stefan
Penaloza, Rafael
FUZZY SETS AND SYSTEMS, 2016, 298 : 22 - 43
[8] Image Retrieval using Scene Graphs
Johnson, Justin
Krishna, Ranjay
Stark, Michael
Li, Li-Jia
Shamma, David A.
Bernstein, Michael S.
Li Fei-Fei
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3668 - 3678
[9] Scene graph captioner: Image captioning based on structural visual representation
Xu, Ning
Liu, An-An
Liu, Jing
Nie, Weizhi
Su, Yuting
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 58 : 477 - 485
[10] Toward Driving Scene Understanding: A Paradigm and Benchmark Dataset for Ego-Centric Traffic Scene Graph Representation
Zhou, Yuchen
Zhang, Yue
Zhao, Zhanwei
Zhang, Kaidong
Gou, Chao
IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION, 2022, 6 : 962 - 967

← 1 2 3 4 5 →