A Comprehensive Survey of Scene Graphs: Generation and Application

被引:152
作者
Chang, Xiaojun [1 ,2 ]
Ren, Pengzhen [3 ]
Xu, Pengfei [3 ]
Li, Zhihui [4 ]
Chen, Xiaojiang [3 ]
Hauptmann, Alex [5 ]
机构
[1] Univ Technol Sydney, Fac Engn & Informat Technol, ReLER Lab, AAII, Ultimo, NSW 2007, Australia
[2] RMIT Univ, Sch Comp Technol, Melbourne, Vic 3000, Australia
[3] Northwest Univ, Sch Informat Sci & Technol, Xian 710069, Peoples R China
[4] Qilu Univ Technol, Shandong Artificial Intelligence Inst, Shandong Acad Sci, Jinan 250316, Peoples R China
[5] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Scene graph; visual feature extraction; prior information; visual relationship recognition; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
10.1109/TPAMI.2021.3137605
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene graph is a structured representation of a scene that can clearly express the objects, attributes, and relationships between objects in the scene. As computer vision technology continues to develop, people are no longer satisfied with simply detecting and recognizing objects in images; instead, people look forward to a higher level of understanding and reasoning about visual scenes. For example, given an image, we want to not only detect and recognize objects in the image, but also understand the relationship between objects (visual relationship detection), and generate a text description (image captioning) based on the image content. Alternatively, we might want the machine to tell us what the little girl in the image is doing (Visual Question Answering (VQA)), or even remove the dog from the image and find similar images (image editing and retrieval), etc. These tasks require a higher level of understanding and reasoning for image vision tasks. The scene graph is just such a powerful tool for scene understanding. Therefore, scene graphs have attracted the attention of a large number of researchers, and related research is often cross-modal, complex, and rapidly developing. However, no relatively systematic survey of scene graphs exists at present. To this end, this survey conducts a comprehensive investigation of the current scene graph research. More specifically, we first summarize the general definition of the scene graph, then conducte a comprehensive and systematic discussion on the generation method of the scene graph (SGG) and the SGG with the aid of prior knowledge. We then investigate the main applications of scene graphs and summarize the most commonly used datasets. Finally, we provide some insights into the future development of scene graphs.
引用
收藏
页码:1 / 26
页数:26
相关论文
共 190 条
[1]  
Aditya S, 2015, Arxiv, DOI arXiv:1511.03292
[2]   Image Understanding using vision and reasoning through Scene Description Graph [J].
Aditya, Somak ;
Yang, Yezhou ;
Baral, Chitta ;
Aloimonos, Yiannis ;
Fermueller, Cornelia .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2018, 173 :33-45
[3]   Categorizing Object-Action Relations from Semantic Scene Graphs [J].
Aksoy, Eren Erdal ;
Abramov, Alexey ;
Woergoetter, Florentin ;
Dellen, Babette .
2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, :398-405
[4]  
[Anonymous], 2006, PREDICTING STRUCTURE
[5]   3D Scene Graph: A structure for unified semantics, 3D space, and camera [J].
Armeni, Iro ;
He, Zhi-Yang ;
Gwak, JunYoung ;
Zamir, Amir R. ;
Fischer, Martin ;
Malik, Jitendra ;
Savarese, Silvio .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5663-5672
[6]  
Atzmon Y, 2016, Arxiv, DOI arXiv:1608.07639
[7]  
Bordes Antoine, 2013, PROCADV NEURAL INF P, V26
[8]   Invariant Scattering Convolution Networks [J].
Bruna, Joan ;
Mallat, Stephane .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1872-1886
[9]   Learning to Detect Human-Object Interactions [J].
Chao, Yu-Wei ;
Liu, Yunfan ;
Liu, Xieyang ;
Zeng, Huayi ;
Deng, Jia .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389
[10]   HICO: A Benchmark for Recognizing Human-Object Interactions in Images [J].
Chao, Yu-Wei ;
Wang, Zhan ;
He, Yugeng ;
Wang, Jiaxuan ;
Deng, Jia .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1017-1025