Answer-Type Prediction for Visual Question Answering

被引:65
作者
Kafle, Kushal [1 ]
Kanan, Christopher [1 ]
机构
[1] Rochester Inst Technol, Chester F Carlson Ctr Imaging Sci, Rochester, NY 14623 USA
来源
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年
关键词
D O I
10.1109/CVPR.2016.538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.
引用
收藏
页码:4976 / 4984
页数:9
相关论文
共 23 条
  • [1] [Anonymous], ELEMENTS STAT LEARNI
  • [2] [Anonymous], 2015, INT C COMP VIS ICCCV
  • [3] [Anonymous], 2015, CVPR
  • [4] [Anonymous], 2015, NIPS
  • [5] [Anonymous], 32 ANN M ASS COMP LI
  • [6] [Anonymous], 2003, NIPS
  • [7] [Anonymous], 2015, NIPS
  • [8] [Anonymous], 2015, ICML
  • [9] [Anonymous], 2014, NIPS
  • [10] [Anonymous], 2012, ECCV