Neural Motifs: Scene Graph Parsing with Global Context

被引:743
作者
Zellers, Rowan [1 ]
Yatskar, Mark [1 ,2 ]
Thomson, Sam [3 ]
Choi, Yejin [1 ,2 ]
机构
[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[2] Allen Inst Artificial Intelligence, Seattle, WA USA
[3] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
来源
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2018.00611
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that object labels are highly predictive of relation labels but not vice-versa. We also find that there are recurring patterns even in larger subgraphs: more than 50% of graphs contain motifs involving at least two relations. Our analysis motivates a new baseline: given object detections, predict the most frequent relation between object pairs with the given labels, as seen in the training set. This baseline improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. We then introduce Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graphs that further improves over our strong baseline by an average 7.1% relative gain. Our code is available at github.com/rowanz/neural-motifs.
引用
收藏
页码:5831 / 5840
页数:10
相关论文
共 59 条
[1]  
[Anonymous], 2015, Proc. Advances in Neural Inf. Process. Syst
[2]  
[Anonymous], 2015, Advances in neural information processing systems
[3]  
[Anonymous], 2015, From captions to visual concepts and back
[4]  
[Anonymous], 2017, P 2017 C EMP METH NA
[5]  
[Anonymous], 2015, ARXIV150705670
[6]  
[Anonymous], 2015, ARXIV150600278
[7]  
[Anonymous], 2016, IEEE C COMP VIS PATT
[8]  
[Anonymous], 2015, NeurIPS
[9]  
[Anonymous], 2016, ABS160509410 CORR
[10]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433