Fast Contextual Scene Graph Generation with Unbiased Context Augmentation

被引:9
作者
Jin, Tianlei [1 ]
Guo, Fangtai [1 ]
Meng, Qiwei [1 ]
Zhu, Shiqiang [1 ]
Xi, Xiangming [1 ]
Wang, Wen [1 ]
Mu, Zonghao [1 ]
Song, Wei [1 ]
机构
[1] Zhejiang Lab, Res Ctr Intelligent Robot, Hangzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00610
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene graph generation (SGG) methods have historically suffered from long-tail bias and slow inference speed. In this paper, we notice that humans can analyze relationships between objects relying solely on context descriptions, and this abstract cognitive process may be guided by experience. For example, given descriptions of cup and table with their spatial locations, humans can speculate possible relationships < cup, on, table > or < table, near, cup >. Even without visual appearance information, some impossible predicates like flying in and looking at can be empirically excluded. Accordingly, we propose a contextual scene graph generation (C-SGG) method without using visual information and introduce a context augmentation method. We propose that slight perturbations in the position and size of objects do not essentially affect the relationship between objects. Therefore, at the context level, we can produce diverse context descriptions by using a context augmentation method based on the original dataset. These diverse context descriptions can be used for unbiased training of C-SGG to alleviate long-tail bias. In addition, we also introduce a context guided visual scene graph generation (CV-SGG) method, which leverages the C-SGG experience to guide vision to focus on possible predicates. Through extensive experiments on the publicly available dataset, C-SGG alleviates long-tail bias and omits the huge computation of visual feature extraction to realize real-time SGG. CV-SGG achieves a great trade-off between common predicates and tail predicates.
引用
收藏
页码:6302 / 6311
页数:10
相关论文
共 40 条
[1]   Effects of Exit Doors and Number of Passengers on Airport Evacuation Effeciency Using Agent Based Simulation [J].
Chen, Jie ;
Liu, Dahai ;
Namilae, Sirish ;
Lee, Sang-A ;
Thropp, Jennifer E. ;
Seong, Younho .
INTERNATIONAL JOURNAL OF AVIATION AERONAUTICS AND AEROSPACE, 2019, 6 (05)
[2]  
Chiou Meng-Jiun, 2021, P ACM INT C MULT ACM
[3]  
Desai Alakh, 2021, P IEEE CVF INT C COM
[4]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[5]  
Glenn-jocher AyushExel, 2022, YOLOV5
[6]  
Guo Y., 2021, P IEEE CVF INT C COM, p16 383
[7]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[8]   Panoptic Segmentation [J].
Kirillov, Alexander ;
He, Kaiming ;
Girshick, Ross ;
Rother, Carsten ;
Dollar, Piotr .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9396-9405
[9]  
Knyazev Boris, 2020, ARXIV200508230
[10]   Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [J].
Krishna, Ranjay ;
Zhu, Yuke ;
Groth, Oliver ;
Johnson, Justin ;
Hata, Kenji ;
Kravitz, Joshua ;
Chen, Stephanie ;
Kalantidis, Yannis ;
Li, Li-Jia ;
Shamma, David A. ;
Bernstein, Michael S. ;
Li Fei-Fei .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :32-73