A deeply coupled ConvNet for human activity recognition using dynamic and RGB images

被引：58

作者：

Singh, Tej ^{[1
]}

Vishwakarma, Dinesh Kumar ^{[2
]}

机构：

[1] Delhi Technol Univ, Dept Elect & Commun Engn, New Delhi 110042, India

[2] Delhi Technol Univ, Dept Informat Technol, New Delhi 110042, India

来源：

NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 01期

关键词：

Bi-LSTM; Deep learning; Dynamic motion image; Human activity recognition; REPRESENTATIONS;

D O I：

10.1007/s00521-020-05018-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work is motivated by the tremendous achievement of deep learning models for computer vision tasks, particularly for human activity recognition. It is gaining more attention due to the numerous applications in real life, for example smart surveillance system, human-computer interaction, sports action analysis, elderly healthcare, etc. Recent days, the acquisition and interface of multimodal data are straightforward due to the invention of low-cost depth devices. Several approaches have been developed based on RGB-D (depth) evidence at the cost of additional equipment's setup and high complexity. Contrarily, the methods that utilize RGB frames provide inferior performance due to the absence of depth evidence and these approaches require to less hardware, simple and easy to generalize using only color cameras. In this work, a deeply coupled ConvNet for human activity recognition proposed that utilizes the RGB frames at the top layer with bi-directional long short-term memory (Bi-LSTM). At the bottom layer, the CNN model is trained with a single dynamic motion image. For the RGB frames, the CNN-Bi-LSTM model is trained end-to-end learning to refine the feature of the pre-trained CNN, while dynamic images stream is fine-tuned with the top layers of the pre-trained model to extract temporal information in videos. The features obtained from both the data streams are fused at decision level after the softmax layer with different late fusion techniques and achieved high accuracy with max fusion. The performance accuracy of the model is assessed using four standard single as well as multiple person activities RGB-D (depth) datasets. The highest classification accuracies achieved on human action datasets are compared with similar state of the art and found significantly higher margin such as 2% on SBU Interaction, 4% on MIVIA Action, 1% on MSR Action Pair, and 4% on MSR Daily Activity.

引用

页码：469 / 485

页数：17

共 50 条

[1] A deeply coupled ConvNet for human activity recognition using dynamic and RGB images
Tej Singh
Dinesh Kumar Vishwakarma
Neural Computing and Applications, 2021, 33 : 469 - 485
[2] Human activity recognition in RGB-D videos by dynamic images
Mukherjee, Snehasis
Anvitha, Leburu
Lahari, T. Mohana
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (27-28) : 19787 - 19801
[3] Human activity recognition in RGB-D videos by dynamic images
Snehasis Mukherjee
Leburu Anvitha
T. Mohana Lahari
Multimedia Tools and Applications, 2020, 79 : 19787 - 19801
[4] Human activity recognition using dynamic representation and matching of skeleton feature sequences from RGB-D images
Li, Qiming
Lin, Wenxiong
Li, Jun
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 68 : 265 - 272
[5] Learning Coupled Classifiers with RGB images for RGB-D object recognition
Li, Xiao
Fang, Min
Zhang, Ju-Jie
Wu, Jinqiao
PATTERN RECOGNITION, 2017, 61 : 433 - 446
[6] Human Activity Recognition using RGB-D Sensors
Bagate, Asmita
Shah, Medha
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 902 - 905
[7] Two-Stage Human Activity Recognition Using 2D-ConvNet
Verma, Kamal Kant
Singh, Brij Mohan
Mandoria, H. L.
Chauhan, Prachi
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2020, 6 (02): : 125 - 135
[8] RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
Liu, Yun
Ma, Ruidi
Li, Hui
Wang, Chuanxu
Tao, Ye
JOURNAL OF SENSORS, 2021, 2021
[9] Dynamic Detection and Recognition of Objects Based on Sequential RGB Images
Dong, Shuai
Yang, Zhihua
Li, Wensheng
Zou, Kun
FUTURE INTERNET, 2021, 13 (07)
[10] Gesture Recognition in RGB Videos Using Human Body Keypoints and Dynamic Time Warping
Schneider, Pascal
Memmesheimer, Raphael
Kramer, Ivanna
Paulus, Dietrich
ROBOT WORLD CUP XXIII, ROBOCUP 2019, 2019, 11531 : 281 - 293

← 1 2 3 4 5 →