Deep Learning Approach for Human Action Recognition Using a Time Saliency Map Based on Motion Features Considering Camera Movement and Shot in Video Image Sequences

被引:4
作者
Alavigharahbagh, Abdorreza [1 ]
Hajihashemi, Vahid [1 ]
Machado, Jose J. M. [2 ]
Tavares, Joao Manuel R. S. [2 ]
Moscato, Vincenzo
机构
[1] Univ Porto, Fac Engn, Rua Dr Roberto Frias S-N, P-4200465 Porto, Portugal
[2] Univ Porto, Fac Engn, Dept Engn Mecan, Rua Dr Roberto Frias S-N, P-4200465 Porto, Portugal
关键词
Human Action Recognition (HAR); deep learning; RNN; time saliency map; camera's movement cancellation; REPRESENTATION; FLOW;
D O I
10.3390/info14110616
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, a hierarchical method for action recognition based on temporal and spatial features is proposed. In current HAR methods, camera movement, sensor movement, sudden scene changes, and scene movement can increase motion feature errors and decrease accuracy. Another important aspect to take into account in a HAR method is the required computational cost. The proposed method provides a preprocessing step to address these challenges. As a preprocessing step, the method uses optical flow to detect camera movements and shots in input video image sequences. In the temporal processing block, the optical flow technique is combined with the absolute value of frame differences to obtain a time saliency map. The detection of shots, cancellation of camera movement, and the building of a time saliency map minimise movement detection errors. The time saliency map is then passed to the spatial processing block to segment the moving persons and/or objects in the scene. Because the search region for spatial processing is limited based on the temporal processing results, the computations in the spatial domain are drastically reduced. In the spatial processing block, the scene foreground is extracted in three steps: silhouette extraction, active contour segmentation, and colour segmentation. Key points are selected at the borders of the segmented foreground. The last used features are the intensity and angle of the optical flow of detected key points. Using key point features for action detection reduces the computational cost of the classification step and the required training time. Finally, the features are submitted to a Recurrent Neural Network (RNN) to recognise the involved action. The proposed method was tested using four well-known action datasets: KTH, Weizmann, HMDB51, and UCF101 datasets and its efficiency was evaluated. Since the proposed approach segments salient objects based on motion, edges, and colour features, it can be added as a preprocessing step to most current HAR systems to improve performance.
引用
收藏
页数:27
相关论文
共 137 条
  • [1] Human action recognition using three orthogonal planes with unsupervised deep convolutional neural network
    Abdelbaky, Amany
    Aly, Saleh
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (13) : 20019 - 20043
  • [2] Two-stream spatiotemporal feature fusion for human action recognition
    Abdelbaky, Amany
    Aly, Saleh
    [J]. VISUAL COMPUTER, 2021, 37 (07) : 1821 - 1835
  • [3] Multilevel thresholding image segmentation using meta-heuristic optimization algorithms: comparative analysis, open challenges and new trends
    Abualigah, Laith
    Almotairi, Khaled H.
    Abd Elaziz, Mohamed
    [J]. APPLIED INTELLIGENCE, 2023, 53 (10) : 11654 - 11704
  • [4] A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection
    Afza, Farhat
    Khan, Muhammad Attique
    Sharif, Muhammad
    Kadry, Seifedine
    Manogaran, Gunasekaran
    Saba, Tanzila
    Ashraf, Imran
    Damasevicius, Robertas
    [J]. IMAGE AND VISION COMPUTING, 2021, 106
  • [5] Sparse Deep LSTMs with Convolutional Attention for Human Action Recognition
    Aghaei A.
    Nazari A.
    Moghaddam M.E.
    [J]. SN Computer Science, 2021, 2 (3)
  • [6] Ahammed M.J., 2021, Turk. J. Comput. Math. Educ. (TURCOMAT), V12, P1320
  • [7] STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
    Ahn, Dasom
    Kim, Sangwon
    Hong, Hyunsu
    Ko, Byoung Chul
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3319 - 3328
  • [8] An efficient multilevel color image thresholding based on modified whale optimization algorithm
    Anitha, J.
    Pandian, S. Immanuel Alex
    Agnes, S. Akila
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178 (178)
  • [9] Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization
    Aslan, Muhammet Fatih
    Durdu, Akif
    Sabanci, Kadir
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (12) : 8585 - 8597
  • [10] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495