SDIGRU: Spatial and Deep Features Integration Using Multilayer Gated Recurrent Unit for Human Activity Recognition

被引:22
作者
Ahmad, Tariq [1 ]
Wu, Jinsong [2 ,3 ]
机构
[1] Guilin Univ Elect Technol, Sch Informat & Commun Engn, Guilin 541004, Peoples R China
[2] Guilin Univ Elect Technol, Sch Artificial Intelligence, Guilin 510004, Peoples R China
[3] Univ Chile, Dept Elect Engn, Santiago 8370451, Chile
关键词
Convolution neural networks (CNNs); gated recurrent unit (GRU); human activity recognition; FUSION; NETWORK; VIDEOS; CNN;
D O I
10.1109/TCSS.2023.3249152
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Smart video surveillance plays a significant role in public security via storing a huge amount of continual stream data, evaluates them, and generates warns where undesirable human activities are performed. Recognition of human activities in video surveillance has faced many challenges such as optimal evaluation of human activities under growing volumes of streaming data with complex computation and huge time processing complexity. To tackle these challenges we introduce a lightweighted spatial-deep features integration using multilayer GRU (SDIGRU). First, we extract spatial and deep features from frames sequence of realistic human activity videos via utilizing a lightweight MobileNetV2 model and then integrate those spatial-deep features. Although deep features can be used for human activity recognition, they contain only the high-level appearance, which is insufficient to correctly identify the particular activity of human. Thus, we jointly apply deep information with spatial appearance to produce detailed level information. Furthermore, we select rich informative features from spatial-deep appearances. Then, we train multilayer gated recurrent unit (GRU) and feed informative features to learn the temporal dynamics of human activity frames sequence at each time step of GRU. We conduct our experiments on benchmark YouTube11, HMDB51, and UCF101 datasets of human activity recognition. The empirical results show that our method achieved significant recognition performance with low computational complexity and quick response. Finally, we compare the results with existing state-of-the-art techniques, which show the effectiveness of our method.
引用
收藏
页码:973 / 985
页数:13
相关论文
共 50 条
  • [1] Ahmad T, 2021, INT J ADV COMPUT SC, V12, P18
  • [2] Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition
    Liu, An-An
    Su, Yu-Ting
    Nie, Wei-Zhi
    Kankanhalli, Mohan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (01) : 102 - 114
  • [3] Ballas N, 2015, ARXIV
  • [4] Bazzani L., 2016, ARXIV
  • [5] Bregonzio M, 2009, PROC CVPR IEEE, P1948, DOI 10.1109/CVPRW.2009.5206779
  • [6] Generalized Rank Pooling for Activity Recognition
    Cherian, Anoop
    Fernando, Basura
    Harandi, Mehrtash
    Gould, Stephen
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1581 - 1590
  • [7] Cho K, 2014, P WORKSH SYNT SEM ST, DOI [10.3115/v1/w14-4012, 10.3115/v1/W14-4012]
  • [8] Chollet F., 2016, arXiv
  • [9] Couturier R., 2021, ARXIV
  • [10] Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition
    Crispim-Junior, Carlos F.
    Buso, Vincent
    Avgerinakis, Konstantinos
    Meditskos, Georgios
    Briassouli, Alexia
    Benois-Pineau, Jenny
    Kompatsiaris, Ioannis
    Bremond, Francois
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) : 1598 - 1611