A Multi-scale CNN for Affordance Segmentation in RGB Images

被引:105
作者
Roy, Anirban [1 ]
Todorovic, Sinisa [1 ]
机构
[1] Oregon State Univ, Sch Elect Engn & Comp Sci, Corvallis, OR 97331 USA
来源
COMPUTER VISION - ECCV 2016, PT IV | 2016年 / 9908卷
关键词
Object affordance; Mid-level cues; Deep learning; OBJECT AFFORDANCES; RECOGNITION; VIDEOS;
D O I
10.1007/978-3-319-46493-0_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a single RGB image our goal is to label every pixel with an affordance type. By affordance, we mean an object's capability to readily support a certain human action, without requiring precursor actions. We focus on segmenting the following five affordance types in indoor scenes: 'walkable', 'sittable', 'lyable', 'reachable', and `movable'. Our approach uses a deep architecture, consisting of a number of multi-scale convolutional neural networks, for extracting mid-level visual cues and combining them toward affordance segmentation. The mid-level cues include depth map, surface normals, and segmentation of four types of surfaces - namely, floor, structure, furniture and props. For evaluation, we augmented the NYUv2 dataset with new ground-truth annotations of the five affordance types. We are not aware of prior work which starts from pixels, infers mid-level cues, and combines them in a feed-forward fashion for predicting dense affordance maps of a single RGB image.
引用
收藏
页码:186 / 201
页数:16
相关论文
共 49 条
[1]  
[Anonymous], CVPR
[2]  
[Anonymous], 2015, ICCV
[3]  
[Anonymous], 2010, P CVPR
[4]  
[Anonymous], 2015, CVPR
[5]  
[Anonymous], 2011, P CVPR
[6]  
[Anonymous], 2013, CVPR
[7]  
[Anonymous], 2013, ICLR
[8]  
[Anonymous], 2013, CVPR
[9]  
[Anonymous], 2011, COMPUTER VISION PATT
[10]  
[Anonymous], 2009, ICCV