A deeply coupled ConvNet for human activity recognition using dynamic and RGB images

被引:58
|
作者
Singh, Tej [1 ]
Vishwakarma, Dinesh Kumar [2 ]
机构
[1] Delhi Technol Univ, Dept Elect & Commun Engn, New Delhi 110042, India
[2] Delhi Technol Univ, Dept Informat Technol, New Delhi 110042, India
来源
NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 01期
关键词
Bi-LSTM; Deep learning; Dynamic motion image; Human activity recognition; REPRESENTATIONS;
D O I
10.1007/s00521-020-05018-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work is motivated by the tremendous achievement of deep learning models for computer vision tasks, particularly for human activity recognition. It is gaining more attention due to the numerous applications in real life, for example smart surveillance system, human-computer interaction, sports action analysis, elderly healthcare, etc. Recent days, the acquisition and interface of multimodal data are straightforward due to the invention of low-cost depth devices. Several approaches have been developed based on RGB-D (depth) evidence at the cost of additional equipment's setup and high complexity. Contrarily, the methods that utilize RGB frames provide inferior performance due to the absence of depth evidence and these approaches require to less hardware, simple and easy to generalize using only color cameras. In this work, a deeply coupled ConvNet for human activity recognition proposed that utilizes the RGB frames at the top layer with bi-directional long short-term memory (Bi-LSTM). At the bottom layer, the CNN model is trained with a single dynamic motion image. For the RGB frames, the CNN-Bi-LSTM model is trained end-to-end learning to refine the feature of the pre-trained CNN, while dynamic images stream is fine-tuned with the top layers of the pre-trained model to extract temporal information in videos. The features obtained from both the data streams are fused at decision level after the softmax layer with different late fusion techniques and achieved high accuracy with max fusion. The performance accuracy of the model is assessed using four standard single as well as multiple person activities RGB-D (depth) datasets. The highest classification accuracies achieved on human action datasets are compared with similar state of the art and found significantly higher margin such as 2% on SBU Interaction, 4% on MIVIA Action, 1% on MSR Action Pair, and 4% on MSR Daily Activity.
引用
收藏
页码:469 / 485
页数:17
相关论文
共 50 条
  • [1] A deeply coupled ConvNet for human activity recognition using dynamic and RGB images
    Tej Singh
    Dinesh Kumar Vishwakarma
    Neural Computing and Applications, 2021, 33 : 469 - 485
  • [2] Human activity recognition in RGB-D videos by dynamic images
    Mukherjee, Snehasis
    Anvitha, Leburu
    Lahari, T. Mohana
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (27-28) : 19787 - 19801
  • [3] Human activity recognition in RGB-D videos by dynamic images
    Snehasis Mukherjee
    Leburu Anvitha
    T. Mohana Lahari
    Multimedia Tools and Applications, 2020, 79 : 19787 - 19801
  • [4] Human activity recognition using dynamic representation and matching of skeleton feature sequences from RGB-D images
    Li, Qiming
    Lin, Wenxiong
    Li, Jun
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 68 : 265 - 272
  • [5] Learning Coupled Classifiers with RGB images for RGB-D object recognition
    Li, Xiao
    Fang, Min
    Zhang, Ju-Jie
    Wu, Jinqiao
    PATTERN RECOGNITION, 2017, 61 : 433 - 446
  • [6] Human Activity Recognition using RGB-D Sensors
    Bagate, Asmita
    Shah, Medha
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 902 - 905
  • [7] Two-Stage Human Activity Recognition Using 2D-ConvNet
    Verma, Kamal Kant
    Singh, Brij Mohan
    Mandoria, H. L.
    Chauhan, Prachi
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2020, 6 (02): : 125 - 135
  • [8] RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
    Liu, Yun
    Ma, Ruidi
    Li, Hui
    Wang, Chuanxu
    Tao, Ye
    JOURNAL OF SENSORS, 2021, 2021
  • [9] Dynamic Detection and Recognition of Objects Based on Sequential RGB Images
    Dong, Shuai
    Yang, Zhihua
    Li, Wensheng
    Zou, Kun
    FUTURE INTERNET, 2021, 13 (07)
  • [10] Gesture Recognition in RGB Videos Using Human Body Keypoints and Dynamic Time Warping
    Schneider, Pascal
    Memmesheimer, Raphael
    Kramer, Ivanna
    Paulus, Dietrich
    ROBOT WORLD CUP XXIII, ROBOCUP 2019, 2019, 11531 : 281 - 293