Human action recognition using three orthogonal planes with unsupervised deep convolutional neural network

被引:19
作者
Abdelbaky, Amany [1 ]
Aly, Saleh [1 ,2 ]
机构
[1] Aswan Univ, Fac Engn, Elect Engn Dept, Aswan, Egypt
[2] Majmaah Univ, Dept Informat Technol, Coll Comp & Informat Sci, Al Majmaah 11952, Saudi Arabia
关键词
Human action recognition; Unsupervised convolutional architectures; Principal component analysis network (PCANet); Three orthogonal planes(TOP); FEATURES; PERFORMANCE;
D O I
10.1007/s11042-021-10636-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning models have attained great success for an extensive range of computer vision applications including image and video classification. However, the complex architecture of the most recently developed networks imposes certain memory and computational resource limitations, especially for human action recognition applications. Unsupervised deep convolutional neural networks such as PCANet can alleviate these limitations and hence significantly reduce the computational complexity of the whole recognition system. In this work, instead of using 3D convolutional neural network architecture to learn temporal features of video actions, the unsupervised convolutional PCANet model is extended into (PCANet-TOP) which effectively learn spatiotemporal features from Three Orthogonal Planes (TOP). For each video sequence, spatial frames (XY) and temporal planes (XT and YT) are utilized to train three different PCANet models. Then, the learned features are fused after reducing their dimensionality using whitening PCA to obtain spatiotemporal feature representation of the action video. Finally, Support Vector Machine (SVM) classifier is applied for action classification process. The proposed method is evaluated on four benchmarks and well-known datasets, namely, Weizmann, KTH, UCF Sports, and YouTube action datasets. The recognition results show that the proposed PCANet-TOP provides discriminative and complementary features using three orthogonal planes and able to achieve promising and comparable results with state-of-the-art methods.
引用
收藏
页码:20019 / 20043
页数:25
相关论文
共 59 条
[1]   Two-stream spatiotemporal feature fusion for human action recognition [J].
Abdelbaky, Amany ;
Aly, Saleh .
VISUAL COMPUTER, 2021, 37 (07) :1821-1835
[2]  
Abdelbaky A, 2020, PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), P257, DOI [10.1109/ITCE48509.2020.9047769, 10.1109/itce48509.2020.9047769]
[3]   Human action recognition using short-time motion energy template images and PCANet features [J].
Abdelbaky, Amany ;
Aly, Saleh .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (16) :12561-12574
[4]   Human action recognition using shape and CLG-motion flow from multi-view image sequences [J].
Ahmad, Mohiuddin ;
Lee, Seong-Whan .
PATTERN RECOGNITION, 2008, 41 (07) :2237-2252
[5]   DeepArSLR: A Novel Signer-Independent Deep Learning Framework for Isolated Arabic Sign Language Gestures Recognition [J].
Aly, Saleh ;
Aly, Walaa .
IEEE ACCESS, 2020, 8 :83199-83212
[6]   Human action recognition using bag of global and local Zernike moment features [J].
Aly, Saleh ;
Sayed, Asmaa .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) :24923-24953
[7]   User-Independent American Sign Language Alphabet Recognition Based on Depth Image and PCANet Features [J].
Aly, Walaa ;
Aly, Saleh ;
Almotairi, Sultan .
IEEE ACCESS, 2019, 7 :123138-123150
[8]   Convolutional neural network on three orthogonal planes for dynamic texture classification [J].
Andrearczyk, Vincent ;
Whelan, Paul F. .
PATTERN RECOGNITION, 2018, 76 :36-49
[9]  
Anh-Phuong Ta, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3224, DOI 10.1109/ICPR.2010.788
[10]  
[Anonymous], 4 UK COMP VIS STUD W