Multi-view region-adaptive multi-temporal DMM and RGB action recognition

被引:9
作者
Al-Faris, Mahmoud [1 ]
Chiverton, John P. [1 ]
Yang, Yanyan [2 ]
Ndzi, David L. [3 ]
机构
[1] Univ Portsmouth, Sch Energy & Elect Engn, Portsmouth PO1 3DJ, Hants, England
[2] Univ Portsmouth, Sch Comp, Portsmouth PO1 3HE, Hants, England
[3] Univ West Scotland, Sch Comp Engn & Phys Sci, Paisley PA1 2BE, Renfrew, Scotland
关键词
Action recognition; DMM; 3D CNN; Region adaptive; ENSEMBLE;
D O I
10.1007/s10044-020-00886-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel multi-view region-adaptive multi-resolution-in-time depth motion map (MV-RAMDMM) formulation combined with appearance information. Multi-stream 3D convolutional neural networks (CNNs) are trained on the different views and time resolutions of the region-adaptive depth motion maps. Multiple views are synthesised to enhance the view invariance. The region-adaptive weights, based on localised motion, accentuate and differentiate parts of actions possessing faster motion. Dedicated 3D CNN streams for multi-time resolution appearance information are also included. These help to identify and differentiate between small object interactions. A pre-trained 3D-CNN is used here with fine-tuning for each stream along with multi-class support vector machines. Average score fusion is used on the output. The developed approach is capable of recognising both human action and human-object interaction. Three public-domain data-sets, namely MSR 3D Action, Northwestern UCLA multi-view actions and MSR 3D daily activity, are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the-art algorithms.
引用
收藏
页码:1587 / 1602
页数:16
相关论文
共 71 条
[21]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[22]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[23]   Exploiting deep residual networks for human action recognition from skeletal data [J].
Huy-Hieu Pham ;
Khoudour, Louandi ;
Crouzil, Alain ;
Zegers, Pablo ;
Velastin, Sergio A. .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2018, 170 :51-66
[24]   3D Convolutional Neural Networks for Human Action Recognition [J].
Ji, Shuiwang ;
Xu, Wei ;
Yang, Ming ;
Yu, Kai .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231
[25]  
Jing LL, 2017, IEEE IMAGE PROC, P1837, DOI 10.1109/ICIP.2017.8296599
[26]   Large-scale Video Classification with Convolutional Neural Networks [J].
Karpathy, Andrej ;
Toderici, George ;
Shetty, Sanketh ;
Leung, Thomas ;
Sukthankar, Rahul ;
Fei-Fei, Li .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732
[27]  
Ke Jina, 2017, The Journal of Engineering, DOI 10.1049/joe.2016.0330
[28]   Combining 2D and 3D deep models for action recognition with depth information [J].
Keceli, Ali Seydi ;
Kaya, Aydin ;
Can, Ahmet Burak .
SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (06) :1197-1205
[29]  
Khan I, 2018, 2018 9TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), P872, DOI 10.1109/UEMCON.2018.8796655
[30]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90