3DFCNN: real-time action recognition using 3D deep neural networks with raw depth information

被引：37

作者：

Sanchez-Caballero, Adrian ^{[1
]}

de Lopez-Diz, Sergio ^{[1
]}

Fuentes-Jimenez, David ^{[1
]}

Losada-Gutierrez, Cristina ^{[1
]}

Marron-Romera, Marta ^{[1
]}

Casillas-Perez, David ^{[2
]}

Sarker, Mohammad Ibrahim ^{[1
]}

机构：

[1] Univ Alcala, Dept Elect, Km 33600, Alcala De Henares 28805, Spain

[2] Univ Rey Juan Carlos, Dept Signal Proc & Commun, Madrid, Spain

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2022年 / 81卷 / 17期

关键词：

3D-CNN; Action Recognition; Depth Maps; Real-time; Video-surveillance; RGB-D;

D O I：

10.1007/s11042-022-12091-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This work describes an end-to-end approach for real-time human action recognition from raw depth image-sequences. The proposal is based on a 3D fully convolutional neural network, named 3DFCNN, which automatically encodes spatio-temporal patterns from raw depth sequences. The described 3D-CNN allows actions classification from the spatial and temporal encoded information of depth sequences. The use of depth data ensures that action recognition is carried out protecting people's privacy, since their identities can not be recognized from these data. The proposed 3DFCNN has been optimized to reach a good performance in terms of accuracy while working in real-time. Then, it has been evaluated and compared with other state-of-the-art systems in three widely used public datasets with different characteristics, demonstrating that 3DFCNN outperforms all the non-DNN-based state-of-the-art methods with a maximum accuracy of 83.6% and obtains results that are comparable to the DNN-based approaches, while maintaining a much lower computational cost of 1.09 seconds, what significantly increases its applicability in real-world environments.

引用

页码：24119 / 24143

页数：25

共 99 条

[11] Dawar N, 2017, PROC IEEE INT SYMP, P1342, DOI 10.1109/ISIE.2017.8001440
[12] Dipakkr, 2018, 3D CNN ACT REC
[13] Farooq A., 2015, IEIE Trans. Smart Process. Comput., V4, P281, DOI 10.5573/IEIESPC.2015.4.4.281
[14] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
[15] GEBERT P, 2019, IEEE INT VEH SYM, P969, DOI DOI 10.1109/ivs.2019.8814249
[16] LSTM: A Search Space Odyssey
Greff, Klaus
Srivastava, Rupesh K.
Koutnik, Jan
Steunebrink, Bas R.
Schmidhuber, Juergen
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (10) : 2222 - 2232
[17] Enhanced Computer Vision with Microsoft Kinect Sensor: A Review
Han, Jungong
Shao, Ling
Xu, Dong
Shotton, Jamie
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (05) : 1318 - 1334
[18] He K., 2016, 2016 IEEE C COMP VIS, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[19] Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition
Hou, Yonghong
Wang, Shuang
Wang, Pichao
Gao, Zhimin
Li, Wanqing
[J]. IEEE ACCESS, 2018, 6 : 2206 - 2219
[20] Online view-invariant human action recognition using rgb-d spatio-temporal matrix
Hsu, Yen-Pin
Liu, Chengyin
Chen, Tzu-Yang
Fu, Li-Chen
[J]. PATTERN RECOGNITION, 2016, 60 : 215 - 226

← 1 2 3 4 5 6 7 8 9 10 →