RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

被引:10
作者
Liu, Yun [1 ]
Ma, Ruidi [1 ]
Li, Hui [1 ]
Wang, Chuanxu [1 ]
Tao, Ye [1 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266000, Peoples R China
基金
中国国家自然科学基金;
关键词
FORM;
D O I
10.1155/2021/8864870
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.
引用
收藏
页数:10
相关论文
共 35 条
  • [1] [Anonymous], 2016, P INT C NEUR INF PRO
  • [2] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
    Cao, Zhe
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
  • [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [4] SPATIAL AND TEMPORAL CONTRAST SENSITIVITIES OF NEURONS IN LATERAL GENICULATE-NUCLEUS OF MACAQUE
    DERRINGTON, AM
    LENNIE, P
    [J]. JOURNAL OF PHYSIOLOGY-LONDON, 1984, 357 (DEC): : 219 - 240
  • [5] Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction
    Duric, Z
    Gray, WD
    Heishman, R
    Li, FY
    Rosenfeld, A
    Schoelles, MJ
    Schunn, C
    Wechsler, H
    [J]. PROCEEDINGS OF THE IEEE, 2002, 90 (07) : 1272 - 1289
  • [6] Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition
    Fan, Yanbo
    Weng, Shuchen
    Zhang, Yong
    Shi, Boxin
    Zhang, Yi
    [J]. IEEE ACCESS, 2020, 8 (08): : 15280 - 15290
  • [7] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [8] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [9] Distributed Hierarchical Processing in the Primate Cerebral Cortex
    Felleman, Daniel J.
    Van Essen, David C.
    [J]. CEREBRAL CORTEX, 1991, 1 (01) : 1 - 47
  • [10] Gaur U, 2011, IEEE I CONF COMP VIS, P2595, DOI 10.1109/ICCV.2011.6126548