Richly Activated Graph Convolutional Network for Robust Skeleton-Based Action Recognition

被引:195
作者
Song, Yi-Fan [1 ]
Zhang, Zhang [2 ]
Shan, Caifeng [3 ]
Wang, Liang [2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[3] Shandong Univ Sci & Technol, Coll Elect Engn & Automat, Qingdao 266590, Peoples R China
基金
中国国家自然科学基金;
关键词
Skeleton; Robustness; Noise measurement; Three-dimensional displays; Degradation; Standards; Feature extraction; Action recognition; skeleton; activation map; graph convolutional network; occlusion; jittering; MODEL;
D O I
10.1109/TCSVT.2020.3015051
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Current methods for skeleton-based human action recognition usually work with complete skeletons. However, in real scenarios, it is inevitable to capture incomplete or noisy skeletons, which could significantly deteriorate the performance of current methods when some informative joints are occluded or disturbed. To improve the robustness of action recognition models, a multi-stream graph convolutional network (GCN) is proposed to explore sufficient discriminative features spreading over all skeleton joints, so that the distributed redundant representation reduces the sensitivity of the action models to non-standard skeletons. Concretely, the backbone GCN is extended by a series of ordered streams which is responsible for learning discriminative features from the joints less activated by preceding streams. Here, the activation degrees of skeleton joints of each GCN stream are measured by the class activation maps (CAM), and only the information from the unactivated joints will be passed to the next stream, by which rich features over all active joints are obtained. Thus, the proposed method is termed richly activated GCN (RA-GCN). Compared to the state-of-the-art (SOTA) methods, the RA-GCN achieves comparable performance on the standard NTU RGB+D 60 and 120 datasets. More crucially, on the synthetic occlusion and jittering datasets, the performance deterioration due to the occluded and disturbed joints can be significantly alleviated by utilizing the proposed RA-GCN.
引用
收藏
页码:1915 / 1925
页数:11
相关论文
共 50 条
[1]   Human Activity Analysis: A Review [J].
Aggarwal, J. K. ;
Ryoo, M. S. .
ACM COMPUTING SURVEYS, 2011, 43 (03)
[2]   Deep learning approach for human action recognition in infrared images [J].
Akula, Aparna ;
Shah, Anuj K. ;
Ghosh, Ripul .
COGNITIVE SYSTEMS RESEARCH, 2018, 50 :146-154
[3]  
[Anonymous], 2016, LECT NOTES COMPUT SC
[4]   Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points [J].
Baradel, Fabien ;
Wolf, Christian ;
Mille, Julien ;
Taylor, Graham W. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :469-478
[5]   SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition [J].
Caetano, Carlos ;
Sena, Jessica ;
Bremond, Francois ;
dos Santos, Jefersson A. ;
Schwartz, William Robson .
2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
[6]   Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints [J].
Caetano, Carlos ;
Bremond, Francois ;
Schwartz, William Robson .
2019 32ND SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2019, :16-23
[7]   OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].
Cao, Zhe ;
Hidalgo, Gines ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186
[8]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[9]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[10]  
Hou QB, 2018, ADV NEUR IN, V31