Super Normal Vector for Human Activity Recognition with Depth Cameras

被引:112
作者
Yang, Xiaodong [1 ]
Tian, YingLi [2 ,3 ]
机构
[1] NVIDIA Res, Santa Clara, CA 95050 USA
[2] CUNY City Coll, Dept Elect Engn, New York, NY 10031 USA
[3] CUNY, Grad Ctr, New York, NY 10031 USA
基金
美国国家科学基金会;
关键词
Human activity recognition; depth camera; feature representation; spatio-temporal information; IMAGE CLASSIFICATION; ACTIONLET ENSEMBLE; MOTION; POSE;
D O I
10.1109/TPAMI.2016.2565479
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advent of cost-effectiveness and easy-operation depth cameras has facilitated a variety of visual recognition tasks including human activity recognition. This paper presents a novel framework for recognizing human activities from video sequences captured by depth cameras. We extend the surface normal to polynormal by assembling local neighboring hypersurface normals from a depth sequence to jointly characterize local motion and shape information. We then propose a general scheme of super normal vector (SNV) to aggregate the low-level polynormals into a discriminative representation, which can be viewed as a simplified version of the Fisher kernel representation. In order to globally capture the spatial layout and temporal order, an adaptive spatio-temporal pyramid is introduced to subdivide a depth video into a set of space-time cells. In the extensive experiments, the proposed approach achieves superior performance to the state-of-the-art methods on the four public benchmark datasets, i.e., MSRAction3D, MSRDailyActivity3D, MSRGesture3D, and MSRActionPairs3D.
引用
收藏
页码:1028 / 1039
页数:12
相关论文
共 53 条
[1]  
[Anonymous], 2004, P ECCV WORKSH STAT L
[2]  
[Anonymous], COMP VIS ACCV 2012
[3]  
[Anonymous], 2013, P 23 INT JOINT C ART
[4]  
[Anonymous], 2011, P 28 INT C MACHINE L
[5]  
[Anonymous], 2016, P IEEE C COMP VIS PA
[6]  
[Anonymous], 2012, P ACM INT C MULT NAR, DOI DOI 10.1145/2393347.2396382
[7]   Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts [J].
Bhattacharya, Subhabrata ;
Kalaych, Mahdi M. ;
Sukthankar, Rahul ;
Shah, Mubarak .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2243-2250
[8]   Learning Mid-Level Features For Recognition [J].
Boureau, Y-Lan ;
Bach, Francis ;
LeCun, Yann ;
Ponce, Jean .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :2559-2566
[9]  
Dollar P., 2005, Proceedings. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS) (IEEE Cat. No. 05EX1178), P65
[10]   Efficient Pose-Based Action Recognition [J].
Eweiwi, Abdalrahman ;
Cheema, Muhammed S. ;
Bauckhage, Christian ;
Gall, Juergen .
COMPUTER VISION - ACCV 2014, PT V, 2015, 9007 :428-443