Tripool: Graph triplet pooling for 3D skeleton-based action recognition

被引:45
作者
Peng, Wei [1 ]
Hong, Xiaopeng [1 ,2 ]
Zhao, Guoying [1 ,3 ]
机构
[1] Univ Oulu, Ctr Machine Vis & Signal Anal, Oulu, Finland
[2] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Sch Cyber Sci & Engn, Xian, Peoples R China
[3] Northwest Univ, Sch Informat & Technol, Xian, Peoples R China
基金
芬兰科学院; 中国国家自然科学基金;
关键词
3D skeletal action recognition; ST-GCN; Graph pooling; Graph topology analysis;
D O I
10.1016/j.patcog.2021.107921
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph Convolutional Network (GCN) has already been successfully applied to skeleton-based action recog-nition. However, current GCNs in this task are lack of pooling operations such that the architectures are inherently flat, which not only increases the computational complexity but also requires larger memory space to keep the entire graph embedding. More seriously, a flat architecture forces the high-level seman-tic feature representations to have the same physical structure of the low-level input skeletons, which we argue is unreasonable and harmful for the final performance. To address these issues, we propose Tripool, a novel graph pooling method for 3D action recognition from skeleton data. Tripool provides to optimize a triplet pooling loss, in which both graph topology and global graph context are taken into considera-tion, to learn a hierarchical graph representation. The training process of graph pooling is efficient since it optimizes the graph topology by minimizing an upper bound of the pooling loss. Besides, Tripool also automatically generates an embedding matrix since the graph is changed after pooling. On one hand, Tripool reduces the computational cost by removing the redundant nodes. On the other hand it over-comes the limitation of the topology constrain for the high-level semantic representations, thus improves the final performance. Tripool can be combined with various graph neural networks in an end-to-end fashion. Comprehensive experiments on two current largest scale 3D datasets are conducted to evalu-ate our method. With our Tripool, we consistently get the best results in terms of various performance measures. (C) 2021 The Author(s). Published by Elsevier Ltd.
引用
收藏
页数:12
相关论文
共 56 条
[1]   Protein function prediction via graph kernels [J].
Borgwardt, KM ;
Ong, CS ;
Schönauer, S ;
Vishwanathan, SVN ;
Smola, AJ ;
Kriegel, HP .
BIOINFORMATICS, 2005, 21 :I47-I56
[2]  
Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
[3]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[4]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[5]  
Defferrard M, 2016, ADV NEUR IN, V29
[6]  
Dhillon IS, 2007, IEEE T PATTERN ANAL, V29, P1944, DOI [10.1109/TPAMI.2007.1115, 10.1109/TP'AMI.2007.1115]
[7]   A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning [J].
Do, Thanh-Toan ;
Tran, Toan ;
Reid, Ian ;
Kumar, Vijay ;
Hoang, Tuan ;
Carneiro, Gustavo .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10396-10405
[8]  
Dobson Paul D, 2003, J Mol Biol, V330, P771
[9]  
Fernando B, 2015, PROC CVPR IEEE, P5378, DOI 10.1109/CVPR.2015.7299176
[10]  
Gao H., 2019, 190505178 ARXIV