Symmetric Sub-graph Spatio-Temporal Graph Convolution and its application in Complex Activity Recognition

被引:9
作者
Das, Pratyusha [1 ]
Ortega, Antonio [1 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90007 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Hand Skeleton; Graph based methods; complex activity analysis; Spatio-temporal graph neural network; First person hand action (FPHA) dataset; SEGMENTATION;
D O I
10.1109/ICASSP39728.2021.9413833
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Understanding complex hand actions, such as assembly tasks or kitchen activities, from hand skeleton data is an important yet challenging task. In this paper, we analyze hand skeleton-based complex activities by modeling dynamic hand skeletons through a spatio-temporal graph convolutional neural network (ST-GCN). This model jointly learns and extracts Spatio-temporal features for activity recognition. Our proposed technique, Symmetric Sub-graph spatio-temporal graph convolutional neural network (S-2-ST-GCN), exploits the symmetric nature of hand graphs to decompose them into smaller sub-graphs, which allow us to build a separate temporal model for the relative motion of the fingers. This subgraph approach can be implemented efficiently by preprocessing input data using a Haar unit based orthogonal matrix. Then, in addition to spatial filters, separate temporal filters can be learned for each sub-graph. We evaluate the performance of the proposed method on the First-Person Hand Action dataset. While the proposed method shows comparable performance with the state of the art methods in train:test=1:1 setting, it achieves this with greater stability. Furthermore, we demonstrate significant performance improvement in comparison to state of the art methods in the cross-person setting, where the model did not come across a test subject's data while learning. S-2-ST-GCN also shows superior performance than a finger-based decomposition of the hand graph where no preprocessing is applied.
引用
收藏
页码:3215 / 3219
页数:5
相关论文
共 30 条
[1]   A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery [J].
Ahmidi, Narges ;
Tao, Lingling ;
Sefati, Shahin ;
Gao, Yixin ;
Lea, Colin ;
Haro, Benjamin Bejar ;
Zappella, Luca ;
Khudanpur, Sanjeev ;
Vidal, Rene ;
Hager, Gregory D. .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2017, 64 (09) :2025-2041
[2]  
Berndt D. J., 1994, Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, P359
[3]   Stability and generalization [J].
Bousquet, O ;
Elisseeff, A .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) :499-526
[4]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[5]  
Das P, 2019, INT CONF ACOUST SPEE, P4075, DOI 10.1109/ICASSP.2019.8683643
[6]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[7]   First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations [J].
Garcia-Hernando, Guillermo ;
Yuan, Shanxin ;
Baek, Seungryul ;
Kim, Tae-Kyun .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :409-419
[8]   Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection [J].
Garcia-Hernando, Guillermo ;
Kim, Tae-Kyun .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :407-415
[9]   CXCL12/CXCR4: a symbiotic bridge linking cancer cells and their stromal neighbors in oncogenic communication networks [J].
Guo, F. ;
Wang, Y. ;
Liu, J. ;
Mok, S. C. ;
Xue, F. ;
Zhang, W. .
ONCOGENE, 2016, 35 (07) :816-826
[10]  
Han T., 2017, ARXIV170906391