Self-organizing neural integration of pose-motion features for human action recognition

被引:58
作者
Parisi, German I. [1 ]
Weber, Cornelius [1 ]
Wermter, Stefan [1 ]
机构
[1] Univ Hamburg, Knowledge Technol Inst, Dept Informat, D-22527 Hamburg, Germany
来源
FRONTIERS IN NEUROROBOTICS | 2015年 / 9卷
关键词
action recognition; visual processing; depth information; neural networks; self-organizing learning; robot perception; BIOLOGICAL MOTION; FUNCTIONAL ARCHITECTURE; PERCEPTION; MODEL; DISPLAYS; FORM;
D O I
10.3389/fnbot.2015.00003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented toward human-robot communication, action classification, and action-driven perception. These challenging tasks may generally involve the processing of a huge amount of visual information and learning-based mechanisms for generalizing a set of training actions and classifying new samples. To operate in natural environments, a crucial property is the efficient and robust recognition of actions, also under noisy conditions caused by, for instance, systematic sensor errors and temporarily occluded persons. Studies of the mammalian visual system and its outperforming ability to process biological motion information suggest separate neural pathways for the distinct processing of pose and motion features at multiple levels and the subsequent integration of these visual cues for action perception. We present a neurobiologically-motivated approach to achieve noise-tolerant action recognition in real time. Our model consists of self-organizing Growing When Required (GWR) networks that obtain progressively generalized representations of sensory inputs and learn inherent spatio-temporal dependencies. During the training, the GWR networks dynamically change their topological structure to better match the input space. We first extract pose and motion features from video sequences and then cluster actions in terms of prototypical pose-motion trajectories. Multi-cue trajectories from matching action frames are subsequently combined to provide action dynamics in the joint feature space. Reported experiments show that our approach outperforms previous results on a dataset of full-body actions captured with a depth sensor, and ranks among the best results for a public benchmark of domestic daily actions.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 74 条
[31]   A model of biological motion perception from configural form cues [J].
Lange, J ;
Lappe, M .
JOURNAL OF NEUROSCIENCE, 2006, 26 (11) :2894-2906
[32]  
Layher G., 2012, ICANN12, P427
[33]   A self-organising network that grows when required [J].
Marsland, S ;
Shapiro, J ;
Nehmzow, U .
NEURAL NETWORKS, 2002, 15 (8-9) :1041-1058
[34]  
Martinetz T., 1991, Artificial Neural Networks. Proceedings of the 1991 International Conference. ICANN-91, P397
[35]  
Martinetz T. M., 1993, Competitive Hebbian Learning Rule Forms Perfectly Topology Preserving Maps, P427
[36]   Fall detection system using Kinect's infrared sensor [J].
Mastorakis, Georgios ;
Makris, Dimitrios .
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2014, 9 (04) :635-646
[37]   IMITATION OF FACIAL AND MANUAL GESTURES BY HUMAN NEONATES [J].
MELTZOFF, AN ;
MOORE, MK .
SCIENCE, 1977, 198 (4312) :75-78
[38]   Seeing biological motion [J].
Neri, P ;
Morrone, MC ;
Burr, DC .
NATURE, 1998, 395 (6705) :894-896
[39]   Multilevel Depth and Image Fusion for Human Activity Detection [J].
Ni, Bingbing ;
Pei, Yong ;
Moulin, Pierre ;
Yan, Shuicheng .
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (05) :1383-1394
[40]   Integration of form and motion in the anterior superior temporal polysensory area (STPa) of the macaque monkey [J].
Oram, MW ;
Perrett, DI .
JOURNAL OF NEUROPHYSIOLOGY, 1996, 76 (01) :109-129