Multi-view region-adaptive multi-temporal DMM and RGB action recognition

被引：9

作者：

Al-Faris, Mahmoud ^{[1
]}

Chiverton, John P. ^{[1
]}

Yang, Yanyan ^{[2
]}

Ndzi, David L. ^{[3
]}

机构：

[1] Univ Portsmouth, Sch Energy & Elect Engn, Portsmouth PO1 3DJ, Hants, England

[2] Univ Portsmouth, Sch Comp, Portsmouth PO1 3HE, Hants, England

[3] Univ West Scotland, Sch Comp Engn & Phys Sci, Paisley PA1 2BE, Renfrew, Scotland

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2020年 / 23卷 / 04期

关键词：

Action recognition; DMM; 3D CNN; Region adaptive; ENSEMBLE;

D O I：

10.1007/s10044-020-00886-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel multi-view region-adaptive multi-resolution-in-time depth motion map (MV-RAMDMM) formulation combined with appearance information. Multi-stream 3D convolutional neural networks (CNNs) are trained on the different views and time resolutions of the region-adaptive depth motion maps. Multiple views are synthesised to enhance the view invariance. The region-adaptive weights, based on localised motion, accentuate and differentiate parts of actions possessing faster motion. Dedicated 3D CNN streams for multi-time resolution appearance information are also included. These help to identify and differentiate between small object interactions. A pre-trained 3D-CNN is used here with fine-tuning for each stream along with multi-class support vector machines. Average score fusion is used on the output. The developed approach is capable of recognising both human action and human-object interaction. Three public-domain data-sets, namely MSR 3D Action, Northwestern UCLA multi-view actions and MSR 3D daily activity, are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the-art algorithms.

引用

页码：1587 / 1602

页数：16

共 71 条

[21]

Fan RE, 2008, J MACH LEARN RES, V9, P1871

[22] Convolutional Two-Stream Network Fusion for Video Action Recognition [J].

Feichtenhofer, Christoph ;

Pinz, Axel ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941

[23] Exploiting deep residual networks for human action recognition from skeletal data [J].

Huy-Hieu Pham ;

Khoudour, Louandi ;

Crouzil, Alain ;

Zegers, Pablo ;

Velastin, Sergio A. .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2018, 170 :51-66

[24] 3D Convolutional Neural Networks for Human Action Recognition [J].

Ji, Shuiwang ;

Xu, Wei ;

Yang, Ming ;

Yu, Kai .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231

[25]

Jing LL, 2017, IEEE IMAGE PROC, P1837, DOI 10.1109/ICIP.2017.8296599

[26] Large-scale Video Classification with Convolutional Neural Networks [J].

Karpathy, Andrej ;

Toderici, George ;

Shetty, Sanketh ;

Leung, Thomas ;

Sukthankar, Rahul ;

Fei-Fei, Li .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732

[27]

Ke Jina, 2017, The Journal of Engineering, DOI 10.1049/joe.2016.0330

[28] Combining 2D and 3D deep models for action recognition with depth information [J].

Keceli, Ali Seydi ;

Kaya, Aydin ;

Can, Ahmet Burak .

SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (06) :1197-1205

[29]

Khan I, 2018, 2018 9TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), P872, DOI 10.1109/UEMCON.2018.8796655

[30] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

← 1 2 3 4 5 6 7 8 →