Image Understanding using vision and reasoning through Scene Description Graph

被引:45
|
作者
Aditya, Somak [1 ]
Yang, Yezhou [1 ]
Baral, Chitta [1 ]
Aloimonos, Yiannis [2 ]
Fermueller, Cornelia [2 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
[2] Univ Maryland, College Pk, MD 20742 USA
基金
美国国家科学基金会;
关键词
Image Understanding; Commonsense Reasoning; Vision; Reasoning; QUESTIONS; KNOWLEDGE; MODELS;
D O I
10.1016/j.cviu.2017.12.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two of the fundamental tasks in image understanding using text are caption generation and visual question answering (Antol et al., 2015; Xiong et al., 2016). This work presents an intermediate knowledge structure that can be used for both tasks to obtain increased interpretability. We call this knowledge structure Scene Description Graph (SDG), as it is a directed labeled graph, representing objects, actions, regions, as well as their attributes, along with inferred concepts and semantic (from KM-Ontology (Clark et al., 2004)), ontological (i.e. superclass, hasProperty), and spatial relations. Thereby a general architecture is proposed in which a system can represent both the content and underlying concepts of an image using an SDG. The architecture is implemented using generic visual recognition techniques and commonsense reasoning to extract graphs from images. The utility of the generated SDGs is demonstrated in the applications of image captioning, image retrieval, and through examples in visual question answering. The experiments in this work show that the extracted graphs capture syntactic and semantic content of images with reasonable accuracy.
引用
收藏
页码:33 / 45
页数:13
相关论文
共 50 条
  • [1] Explanatory Reasoning for Image Understanding Using Formal Concept Analysis and Description Logics
    Atif, Jamal
    Hudelot, Celine
    Bloch, Isabelle
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (05): : 552 - 570
  • [2] Graph neural networks in vision-language image understanding: a survey
    Senior, Henry
    Slabaugh, Gregory
    Yuan, Shanxin
    Rossi, Luca
    VISUAL COMPUTER, 2025, 41 (01): : 491 - 516
  • [3] Cooperative spatial reasoning for image understanding
    Matsuyama, T
    Wada, T
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 1997, 11 (01) : 205 - 227
  • [4] Explaining Newton's laws of motion: using student reasoning through representations to develop conceptual understanding
    Waldrip, Bruce
    Prain, Vaughan
    Sellings, Peter
    INSTRUCTIONAL SCIENCE, 2013, 41 (01) : 165 - 189
  • [5] A Vision Enriched Intelligent Agent with Image Description Generation
    Zhang, Li
    Fielding, Ben
    Kinghorn, Philip
    Mistry, Kamlesh
    AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1488 - 1490
  • [6] Reasoning on objects and grasping using description logics
    Vitucci, Nicola
    Gini, Giuseppina
    ADVANCED ROBOTICS, 2019, 33 (13) : 616 - 635
  • [7] Reasoning in Fuzzy Description Logics using Automata
    Borgwardt, Stefan
    Penaloza, Rafael
    FUZZY SETS AND SYSTEMS, 2016, 298 : 22 - 43
  • [8] Image Retrieval using Scene Graphs
    Johnson, Justin
    Krishna, Ranjay
    Stark, Michael
    Li, Li-Jia
    Shamma, David A.
    Bernstein, Michael S.
    Li Fei-Fei
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3668 - 3678
  • [9] Scene graph captioner: Image captioning based on structural visual representation
    Xu, Ning
    Liu, An-An
    Liu, Jing
    Nie, Weizhi
    Su, Yuting
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 58 : 477 - 485
  • [10] Toward Driving Scene Understanding: A Paradigm and Benchmark Dataset for Ego-Centric Traffic Scene Graph Representation
    Zhou, Yuchen
    Zhang, Yue
    Zhao, Zhanwei
    Zhang, Kaidong
    Gou, Chao
    IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION, 2022, 6 : 962 - 967