Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks

被引:137
作者
Wang, Pichao [1 ]
Li, Wanqing [1 ]
Gao, Zhimin [1 ]
Tang, Chang [2 ]
Ogunbona, Philip O. [1 ]
机构
[1] Univ Wollongong, Adv Multimedia Res Lab, Wollongong, NSW 2522, Australia
[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Hubei, Peoples R China
关键词
Large-scale; depth; action recognition; convolutional neural networks; GESTURE RECOGNITION;
D O I
10.1109/TMM.2018.2818329
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as dynamic depth images (DDI), dynamic depth normal images (DDNI), and dynamic depth motion normal images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time, and DDNI and DDMNI exploit the 3-D structural information captured by depth maps. Upon the proposed representations, a convolutional neural network (ConvNet)-based method is developed for action recognition. The image-based representations enable us to fine-tune the existing ConvNet models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the large-scale continuous gesture recognition dataset (means the Jaccard index 0.4109), the large-scale isolated gesture recognition dataset (59.21%), and the NTU RGB+D dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.
引用
收藏
页码:1051 / 1061
页数:11
相关论文
共 63 条
[1]   Human activity recognition from 3D data: A review [J].
Aggarwal, J. K. ;
Xia, Lu .
PATTERN RECOGNITION LETTERS, 2014, 48 :70-80
[2]  
[Anonymous], P EUR C COMP VIS ECC
[3]  
[Anonymous], 2014, ADV NEURAL INFORM PR
[4]  
[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59
[5]  
[Anonymous], P 2016 ACM MUL TIM C
[6]  
[Anonymous], 2018, AAAI
[7]  
[Anonymous], 2016, P IEEE C COMP VIS PA
[8]  
[Anonymous], 2012, 2012 IEEE COMP SOC C, DOI DOI 10.1109/CVPRW.2012.6239179
[9]  
[Anonymous], 2012, P ACM INT C MULT NAR, DOI DOI 10.1145/2393347.2396382
[10]   Dynamic Image Networks for Action Recognition [J].
Bilen, Hakan ;
Fernando, Basura ;
Gavves, Efstratios ;
Vedaldi, Andrea ;
Gould, Stephen .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3034-3042