Visual recognition for humanoid robots

被引:9
作者
Fanello, Sean Ryan [3 ]
Ciliberto, Carlo [1 ,2 ]
Noceti, Nicoletta [2 ]
Metta, Giorgio [1 ]
Odone, Francesca [2 ]
机构
[1] Ist Italiano Tecnol, ICub Facil, Genoa, Italy
[2] Univ Genoa, DIBRIS, Genoa, Italy
[3] Microsoft Res Labs, Redmond, WA 98051 USA
关键词
Human-Robot Interaction; Learning and interaction; Visual recognition; Sparse representations; iCub; OBJECT RECOGNITION; SCALE;
D O I
10.1016/j.robot.2016.10.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual perception is a fundamental component for most robotics systems operating in human environments. Specifically, visual recognition is a prerequisite to a large variety of tasks such as tracking, manipulation, human-robot interaction. As a consequence, the lack of successful recognition often becomes a bottleneck for the application of robotics system to real-world situations. In this paper we aim at improving the robot visual perception capabilities in a natural, human-like fashion, with a very limited amount of constraints to the acquisition scenario. In particular our goal is to build and analyze a learning system that can rapidly be re-trained in order to incorporate new evidence if available. To this purpose, we review the state-of-the-art coding-pooling pipelines for visual recognition and propose two modifications which allow us to improve the quality of the representation, while maintaining real-time performances: a coding scheme, Best Code Entries (BCE), and a new pooling operator, Mid-Level Classification Weights (MLCW). The former focuses entirely on sparsity and improves the stability and computational efficiency of the coding phase, the latter increases the discriminability of the visual representation, and therefore the overall recognition accuracy of the system, by exploiting data supervision. The proposed pipeline is assessed from a qualitative perspective on a Human-Robot Interaction (HRI) application on the iCub platform. Quantitative evaluation of the proposed system is performed both on in-house robotics data sets (iCubWorld) and on established computer vision benchmarks (Caltech-256, PASCAL VOC 2007). As a byproduct of this work, we provide for the robotics community an implementation of the proposed visual recognition pipeline which can be used as perceptual layer for more complex robotics applications. (C) 2016 Published by Elsevier B.V.
引用
收藏
页码:151 / 168
页数:18
相关论文
共 83 条
[61]   Enhanced Computer Vision with Microsoft Kinect Sensor: A Review [J].
Han, Jungong ;
Shao, Ling ;
Xu, Dong ;
Shotton, Jamie .
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (05) :1318-1334
[62]   Wavelet feature selection for image classification [J].
Huang, Ke ;
Aviyente, Selin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2008, 17 (09) :1709-1720
[63]  
Jegou H., 2010, CVPR
[64]  
Jia YQ, 2012, PROC CVPR IEEE, P3370, DOI 10.1109/CVPR.2012.6248076
[65]   Using spin images for efficient object recognition in cluttered 3D scenes [J].
Johnson, AE ;
Hebert, M .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (05) :433-449
[66]  
Kong S., 2012, ECCV
[67]   Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection [J].
Koniusz, Piotr ;
Yan, Fei ;
Mikolajczyk, Krystian .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (05) :479-492
[68]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[69]  
Lai K, 2011, IEEE INT CONF ROBOT, P1817
[70]  
Lazebnik S., COMPUTER VISION PATT, V2, P2169