Visual recognition for humanoid robots

被引：9

作者：

Fanello, Sean Ryan ^{[3
]}

Ciliberto, Carlo ^{[1
,2
]}

Noceti, Nicoletta ^{[2
]}

Metta, Giorgio ^{[1
]}

Odone, Francesca ^{[2
]}

机构：

[1] Ist Italiano Tecnol, ICub Facil, Genoa, Italy

[2] Univ Genoa, DIBRIS, Genoa, Italy

[3] Microsoft Res Labs, Redmond, WA 98051 USA

来源：

ROBOTICS AND AUTONOMOUS SYSTEMS | 2017年 / 91卷

关键词：

Human-Robot Interaction; Learning and interaction; Visual recognition; Sparse representations; iCub; OBJECT RECOGNITION; SCALE;

D O I：

10.1016/j.robot.2016.10.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Visual perception is a fundamental component for most robotics systems operating in human environments. Specifically, visual recognition is a prerequisite to a large variety of tasks such as tracking, manipulation, human-robot interaction. As a consequence, the lack of successful recognition often becomes a bottleneck for the application of robotics system to real-world situations. In this paper we aim at improving the robot visual perception capabilities in a natural, human-like fashion, with a very limited amount of constraints to the acquisition scenario. In particular our goal is to build and analyze a learning system that can rapidly be re-trained in order to incorporate new evidence if available. To this purpose, we review the state-of-the-art coding-pooling pipelines for visual recognition and propose two modifications which allow us to improve the quality of the representation, while maintaining real-time performances: a coding scheme, Best Code Entries (BCE), and a new pooling operator, Mid-Level Classification Weights (MLCW). The former focuses entirely on sparsity and improves the stability and computational efficiency of the coding phase, the latter increases the discriminability of the visual representation, and therefore the overall recognition accuracy of the system, by exploiting data supervision. The proposed pipeline is assessed from a qualitative perspective on a Human-Robot Interaction (HRI) application on the iCub platform. Quantitative evaluation of the proposed system is performed both on in-house robotics data sets (iCubWorld) and on established computer vision benchmarks (Caltech-256, PASCAL VOC 2007). As a byproduct of this work, we provide for the robotics community an implementation of the proposed visual recognition pipeline which can be used as perceptual layer for more complex robotics applications. (C) 2016 Published by Elsevier B.V.

引用

页码：151 / 168

页数：18

共 83 条

[61] Enhanced Computer Vision with Microsoft Kinect Sensor: A Review [J].