Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

被引:221
作者
Wang, Pichao [1 ]
Li, Wanqing [1 ]
Gao, Zhimin [1 ]
Zhang, Jing [1 ]
Tang, Chang [2 ]
Ogunbona, Philip O. [1 ]
机构
[1] Univ Wollongong, Adv Multimedia Res Lab, Wollongong, NSW 2522, Australia
[2] Tianjin Univ, Sch Elect Informat Engn, Tianjin 300072, Peoples R China
关键词
Action recognition; deep learning; depth maps; pseudocolor coding; FEATURES; ENSEMBLE;
D O I
10.1109/THMS.2015.2504550
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a new method, i.e., weighted hierarchical depth motion maps (WHDMM) + three-channel deep convolutional neural networks (3ConvNets), for human action recognition from depth maps on small training datasets. Three strategies are developed to leverage the capability of ConvNets in mining discriminative features for recognition. First, different viewpoints are mimicked by rotating the 3-D points of the captured depth maps. This not only synthesizes more data, but also makes the trained ConvNets view-tolerant. Second, WHDMMs at several temporal scales are constructed to encode the spatiotemporal motion patterns of actions into 2-D spatial structures. The 2-D spatial structures are further enhanced for recognition by converting the WHDMMs into pseudocolor images. Finally, the three ConvNets are initialized with the models obtained from ImageNet and fine-tuned independently on the color-coded WHDMMs constructed in three orthogonal planes. The proposed algorithm was evaluated on the MSRAction3D, MSRAction3DExt, UTKinect-Action, and MSRDailyActivity3D datasets using cross-subject protocols. In addition, the method was evaluated on the large dataset constructed from the above datasets. The proposed method achieved 2-9% better results on most of the individual datasets. Furthermore, the proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions.
引用
收藏
页码:498 / 509
页数:12
相关论文
共 28 条
[1]   Improving weapon detection in single energy X-ray images through pseudocoloring [J].
Abidi, Besma R. ;
Zheng, Yue ;
Gribok, Andrei V. ;
Abidi, Mongi A. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2006, 36 (06) :784-796
[2]  
Agrawal P, 2014, LECT NOTES COMPUT SC, V8695, P329, DOI 10.1007/978-3-319-10584-0_22
[3]   Human activity recognition using multi-features and multiple kernel learning [J].
Althloothi, Salah ;
Mahoor, Mohammad H. ;
Zhang, Xiao ;
Voyles, Richard M. .
PATTERN RECOGNITION, 2014, 47 (05) :1800-1812
[4]   Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices [J].
Andre Chaaraoui, Alexandros ;
Ramon Padilla-Lopez, Jose ;
Florez-Revuelta, Francisco .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, :91-97
[5]  
[Anonymous], 2014, ADV NEURAL INFORM PR
[6]  
[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59
[7]  
[Anonymous], P IEEE INT C COMP VI
[8]  
[Anonymous], 2009, P BRIT MACH VIS C
[9]   Learning Hierarchical Features for Scene Labeling [J].
Farabet, Clement ;
Couprie, Camille ;
Najman, Laurent ;
LeCun, Yann .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1915-1929
[10]   RECEPTIVE FIELDS, BINOCULAR INTERACTION AND FUNCTIONAL ARCHITECTURE IN CATS VISUAL CORTEX [J].
HUBEL, DH ;
WIESEL, TN .
JOURNAL OF PHYSIOLOGY-LONDON, 1962, 160 (01) :106-&