RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

被引：10

作者：

Liu, Yun ^{[1
]}

Ma, Ruidi ^{[1
]}

Li, Hui ^{[1
]}

Wang, Chuanxu ^{[1
]}

Tao, Ye ^{[1
]}

机构：

[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266000, Peoples R China

来源：

JOURNAL OF SENSORS | 2021年 / 2021卷

基金：

中国国家自然科学基金;

关键词：

FORM;

D O I：

10.1155/2021/8864870

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.

引用

页数：10

共 35 条

[1] [Anonymous], 2016, P INT C NEUR INF PRO
[2] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Cao, Zhe
Simon, Tomas
Wei, Shih-En
Sheikh, Yaser
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] SPATIAL AND TEMPORAL CONTRAST SENSITIVITIES OF NEURONS IN LATERAL GENICULATE-NUCLEUS OF MACAQUE
DERRINGTON, AM
LENNIE, P
[J]. JOURNAL OF PHYSIOLOGY-LONDON, 1984, 357 (DEC): : 219 - 240
[5] Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction
Duric, Z
Gray, WD
Heishman, R
Li, FY
Rosenfeld, A
Schoelles, MJ
Schunn, C
Wechsler, H
[J]. PROCEEDINGS OF THE IEEE, 2002, 90 (07) : 1272 - 1289
[6] Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition
Fan, Yanbo
Weng, Shuchen
Zhang, Yong
Shi, Boxin
Zhang, Yi
[J]. IEEE ACCESS, 2020, 8 (08): : 15280 - 15290
[7] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
[8] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
[9] Distributed Hierarchical Processing in the Primate Cerebral Cortex
Felleman, Daniel J.
Van Essen, David C.
[J]. CEREBRAL CORTEX, 1991, 1 (01) : 1 - 47
[10] Gaur U, 2011, IEEE I CONF COMP VIS, P2595, DOI 10.1109/ICCV.2011.6126548

← 1 2 3 4 →