Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition

被引:260
作者
Cheng, Ke [1 ,2 ,3 ]
Zhang, Yifan [1 ,2 ,3 ]
Cao, Congqi [5 ]
Shi, Lei [1 ,2 ,3 ]
Cheng, Jian [1 ,2 ,3 ,4 ]
Lu, Hanqing [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, NLPR, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, AIRIA, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[5] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT XXIV | 2020年 / 12369卷
基金
中国国家自然科学基金;
关键词
Skeleton-based action recognition; Decoupling GCN; DropGraph;
D O I
10.1007/978-3-030-58586-0_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In skeleton-based action recognition, graph convolutional networks (GCNs) have achieved remarkable success. Nevertheless, how to efficiently model the spatial-temporal skeleton graph without introducing extra computation burden is a challenging problem for industrial deployment. In this paper, we rethink the spatial aggregation in existing GCN-based skeleton action recognition methods and discover that they are limited by coupling aggregation mechanism. Inspired by the decoupling aggregation mechanism in CNNs, we propose decoupling GCN to boost the graph modeling ability with no extra computation, no extra latency, no extra GPU memory cost, and less than 10% extra parameters. Another prevalent problem of GCNs is over-fitting. Although dropout is a widely used regularization technique, it is not effective for GCNs, due to the fact that activation units are correlated between neighbor nodes. We propose DropGraph to discard features in correlated nodes, which is particularly effective on GCNs. Moreover, we introduce an attention-guided drop mechanism to enhance the regularization effect. All our contributions introduce zero extra computation burden at deployment. We conduct experiments on three datasets (NTU-RGBD, NTU-RGBD-120, and Northwestern-UCLA) and exceed the state-of-the-art performance with less computation cost.
引用
收藏
页码:536 / 553
页数:18
相关论文
共 46 条
[1]  
Caetano C, 2019, Arxiv, DOI [arXiv:1907.13025, DOI 10.48550/ARXIV.1907.13025]
[2]  
DeVries T, 2017, Arxiv, DOI arXiv:1708.04552
[3]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[4]  
Fernando B, 2015, PROC CVPR IEEE, P5378, DOI 10.1109/CVPR.2015.7299176
[5]  
Ghiasi G, 2018, ADV NEUR IN, V31
[6]   Early Action Prediction by Soft Regression [J].
Hu, Jian-Fang ;
Zheng, Wei-Shi ;
Ma, Lianyang ;
Wang, Gang ;
Lai, Jianhuang ;
Zhang, Jianguo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (11) :2568-2583
[7]  
Hu JF, 2015, PROC CVPR IEEE, P5344, DOI 10.1109/CVPR.2015.7299172
[8]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[9]   Learning Clip Representations for Skeleton-Based 3D Action Recognition [J].
Ke, Qiuhong ;
Bennamoun, Mohammed ;
An, Senjian ;
Sohel, Ferdous ;
Boussaid, Farid .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (06) :2842-2855
[10]   A New Representation of Skeleton Sequences for 3D Action Recognition [J].
Ke, Qiuhong ;
Bennamoun, Mohammed ;
An, Senjian ;
Sohel, Ferdous ;
Boussaid, Farid .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4570-4579