A Multi-scale CNN for Affordance Segmentation in RGB Images

被引：105

作者：

Roy, Anirban ^{[1
]}

Todorovic, Sinisa ^{[1
]}

机构：

[1] Oregon State Univ, Sch Elect Engn & Comp Sci, Corvallis, OR 97331 USA

来源：

COMPUTER VISION - ECCV 2016, PT IV | 2016年 / 9908卷

关键词：

Object affordance; Mid-level cues; Deep learning; OBJECT AFFORDANCES; RECOGNITION; VIDEOS;

D O I：

10.1007/978-3-319-46493-0_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Given a single RGB image our goal is to label every pixel with an affordance type. By affordance, we mean an object's capability to readily support a certain human action, without requiring precursor actions. We focus on segmenting the following five affordance types in indoor scenes: 'walkable', 'sittable', 'lyable', 'reachable', and `movable'. Our approach uses a deep architecture, consisting of a number of multi-scale convolutional neural networks, for extracting mid-level visual cues and combining them toward affordance segmentation. The mid-level cues include depth map, surface normals, and segmentation of four types of surfaces - namely, floor, structure, furniture and props. For evaluation, we augmented the NYUv2 dataset with new ground-truth annotations of the five affordance types. We are not aware of prior work which starts from pixels, infers mid-level cues, and combines them in a feed-forward fashion for predicting dense affordance maps of a single RGB image.

引用

页码：186 / 201

页数：16