Learning to Reason: End-to-End Module Networks for Visual Question Answering

被引:252
作者
Hu, Ronghang [1 ]
Andreas, Jacob [1 ]
Rohrbach, Marcus [1 ,2 ]
Darrell, Trevor [1 ]
Saenko, Kate [3 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Facebook AI Res, New York, NY USA
[3] Boston Univ, Boston, MA 02215 USA
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年
关键词
D O I
10.1109/ICCV.2017.93
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number of balls and boxes?" we can look for balls, look for boxes, count them, and compare the results. The recently proposed Neural Module Network (NMN) architecture [3, 2] implements this approach to question answering by parsing questions into linguistic substructures and assembling question-specific deep networks from smaller modules that each solve one subtask. However, existing NMN implementations rely on brittle off-the-shelf parsers, and are restricted to the module configurations proposed by these parsers rather than learning them from data. In this paper, we propose End-to-End Module Networks (N2NMNs), which learn to reason by directly predicting instance-specific network layouts without the aid of a parser. Our model learns to generate network structures (by imitating expert demonstrations) while simultaneously learning network parameters (using the downstream task loss). Experimental results on the new CLEVR dataset targeted at compositional question answering show that N2NMNs achieve an error reduction of nearly 50% relative to state-of-theart attentional approaches, while discovering interpretable network architectures specialized for each question.
引用
收藏
页码:804 / 813
页数:10
相关论文
共 28 条
  • [1] Neural Module Networks
    Andreas, Jacob
    Rohrbach, Marcus
    Darrell, Trevor
    Klein, Dan
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 39 - 48
  • [2] [Anonymous], 2016, NAACL HLT
  • [3] [Anonymous], 2016, ARXIV161200837
  • [4] [Anonymous], 2005, Genetic and Evolutionary Computation Conference, GECCO 2005, Proceedings, Washington DC, USA, June 25-29, 2005
  • [5] [Anonymous], 2016, ARXIV COMPUTER VISIO
  • [6] [Anonymous], 2016, CVPR
  • [7] [Anonymous], ICML
  • [8] [Anonymous], ECCV
  • [9] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [10] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]