Exploiting deep residual networks for human action recognition from skeletal data

被引：46

作者：

Huy-Hieu Pham ^{[1
,2
]}

Khoudour, Louandi ^{[1
]}

Crouzil, Alain ^{[2
]}

Zegers, Pablo ^{[3
]}

Velastin, Sergio A. ^{[4
,5
]}

机构：

[1] Ctr Etud & Expertise Risques Environm Mobilite &, F-31400 Toulouse, France

[2] Univ Toulouse, UPS, Inst Rech Informat Toulouse IRIT, F-31062 Toulouse 9, France

[3] Aparnix, La Gioconda 4355,10B, Santiago, Chile

[4] Univ Carlos III Madrid, Appl Artificial Intelligence Res Grp, Dept Comp Sci, Madrid 28270, Spain

[5] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2018年 / 170卷

关键词：

3D Action recognition; Deep residual networks; Skeletal data;

D O I：

10.1016/j.cviu.2018.03.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB + D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB +D dataset.

引用

页码：51 / 66

页数：16

共 91 条

[41]

Jin K., 2017, J ENG, V1

[42] Large-scale Video Classification with Convolutional Neural Networks [J].

Karpathy, Andrej ;

Toderici, George ;

Shetty, Sanketh ;

Leung, Thomas ;

Sukthankar, Rahul ;

Fei-Fei, Li .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732

[43]

Krizhevsky A, 2009, LEARNING MULTIPLE LA

[44] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[45] Learning realistic human actions from movies [J].

Laptev, Ivan ;

Marszalek, Marcin ;

Schmid, Cordelia ;

Rozenfeld, Benjamin .

2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, :3222-+

[46] Backpropagation Applied to Handwritten Zip Code Recognition [J].

LeCun, Y. ;

Boser, B. ;

Denker, J. S. ;

Henderson, D. ;

Howard, R. E. ;

Hubbard, W. ;

Jackel, L. D. .

NEURAL COMPUTATION, 1989, 1 (04) :541-551

[47]

LeCun Y., 1996, Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop, volume 1524 of Lecture Notes in Computer Science, ppp 9

[48] Joint Distance Maps Based Action Recognition With Convolutional Neural Networks [J].

Li, Chuankun ;

Hou, Yonghong ;

Wang, Pichao ;

Li, Wanqing .

IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (05) :624-628

[49] Application on Integration Technology of Visualized Hierarchical Information [J].

Li, Weibo ;

He, Yang .

2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL I, 2010, :9-12

[50] Three Dimensional Motion Trail Model for Gesture Recognition [J].

Liang, Bin ;

Zheng, Lihong .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, :684-691

← 1 2 3 4 5 6 7 8 9 10 →