Spatial Memory for Context Reasoning in Object Detection

被引:273
作者
Chen, Xinlei [1 ]
Gupta, Abhinav [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年
关键词
D O I
10.1109/ICCV.2017.440
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modeling instance-level context and object-object relationships is extremely challenging. It requires reasoning about bounding boxes of different classes, locations etc. Above all, instance-level spatial reasoning inherently requires modeling conditional distributions on previous detections. Unfortunately, our current object detection systems do not have any memory to remember what to condition on! The state-of-the-art object detectors still detect all object in parallel followed by non-maximal suppression (NMS). While memory has been used for tasks such as captioning, they mostly use image-level memory cells without capturing the spatial layout. On the other hand, modeling object-object relationships requires spatial reasoning not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns. This paper presents a conceptually simple yet powerful solution-Spatial Memory Network (SMN), to model the instance-level context efficiently and effectively. Our spatial memory essentially assembles object instances back into a pseudo "image" representation that is easy to be fed into another ConvNet for object-object context reasoning. This leads to a new sequential reasoning architecture where image and memory are processed in parallel to obtain detections which update the memory again. We show our SMN direction is promising as it provides 2.2% improvement over baseline Faster RCNN on the COCO dataset with VGG161.
引用
收藏
页码:4106 / 4116
页数:11
相关论文
共 98 条
[1]   Measuring the Objectness of Image Windows [J].
Alexe, Bogdan ;
Deselaers, Thomas ;
Ferrari, Vittorio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2189-2202
[2]  
[Anonymous], 2015, P IEEE C COMP VIS PA
[3]  
[Anonymous], 2007, ICCV
[4]  
[Anonymous], 2014, CoRR
[5]  
[Anonymous], 2016, ARXIV160604446
[6]  
[Anonymous], 2015, INT C COMP VIS ICCCV
[7]  
[Anonymous], 2015, NIPS
[8]  
[Anonymous], 2009, CVPR
[9]  
[Anonymous], 1999, NIPS
[10]  
[Anonymous], 2010, P CVPR