Inferring and Executing Programs for Visual Reasoning

被引:232
作者
Johnson, Justin [1 ]
Hariharan, Bharath [2 ]
van der Maaten, Laurens [2 ]
Hoffman, Judy [1 ]
Li Fei-Fei [1 ]
Zitnick, C. Lawrence [2 ]
Girshick, Ross [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Facebook AI Res, Menlo Pk, CA USA
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年
关键词
D O I
10.1109/ICCV.2017.325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE. Using the CLEVR benchmark for visual reasoning, we show that our model significantly outperforms strong baselines and generalizes better in a variety of settings.
引用
收藏
页码:3008 / 3017
页数:10
相关论文
共 50 条
[1]  
[Anonymous], 2016, C EMP METH NAT LANG
[2]  
[Anonymous], 2015, INT C COMP VIS ICCCV
[3]  
[Anonymous], 2016, 4 INT C LEARN REPR I
[4]  
[Anonymous], 2011, ACL
[5]  
[Anonymous], 2016, CVPR
[6]  
[Anonymous], 2017, ICLR
[7]  
[Anonymous], 2017, ICLR
[8]  
[Anonymous], ARXIV161001465
[9]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[10]  
[Anonymous], 2017, IJCV