JointContrast: Skeleton-Based Interaction Recognition with New Representation and Contrastive Learning

被引:2
作者
Zhang, Ji [1 ]
Jia, Xiangze [2 ]
Wang, Zhen [3 ]
Luo, Yonglong [4 ]
Chen, Fulong [4 ]
Yang, Gaoming [5 ]
Zhao, Lihui [1 ]
机构
[1] North Univ China, Sch Software, Taiyuan 030051, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[3] Zhejiang Lab, Res Ctr Big Data Intelligence, Hanghzou 310058, Peoples R China
[4] Anhui Normal Univ, Sch Comp & Informat, Wuhu 241000, Peoples R China
[5] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Peoples R China
基金
美国国家科学基金会;
关键词
interaction recognition; graph representation; contrastive learning; pre-training;
D O I
10.3390/a16040190
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition depends on skeleton sequences to detect categories of human actions. In skeleton-based action recognition, the recognition of action scenes with more than one subject is named as interaction recognition. Different from the single-subject action recognition methods, interaction recognition requires an explicit representation of the interaction information between subjects. Recalling the success of skeletal graph representation and graph convolution in modeling the spatial structural information of skeletal data, we consider whether we can embed the inter-subject interaction information into the skeletal graph and use graph convolution for a unified feature representation. In this paper, we propose the interaction information embedding skeleton graph representation (IE-Graph) and use the graph convolution operation to represent the intra-subject spatial structure information and inter-subject interaction information in a uniform manner. Inspired by recent pre-training methods in 2D vision, we propose unsupervised pre-training methods for skeletal data as well as contrast loss. In SBU datasets, JointContrast achieves 98.2% recognition accuracy. in NTU60 datasets, JointContrast respectively achieves 94.1% and 96.8% recognition accuracy under Cross-Subject and Cross-View evaluation metrics.
引用
收藏
页数:19
相关论文
共 51 条
[1]  
[Anonymous], 2012, 2012 IEEE COMPUTER S, DOI DOI 10.1109/CVPRW.2012.6239234
[2]  
Aydin R, 2014, IN C IND ENG ENG MAN, P1, DOI 10.1109/IEEM.2014.7058588
[3]  
Bachman P, 2019, ADV NEUR IN, V32
[4]  
Chao-Lung Yang, 2020, 2020 IEEE International Conference on Image Processing (ICIP), P2166, DOI 10.1109/ICIP40778.2020.9190680
[5]  
Chen T., 2020, Advances in Neural Information Processing Systems, V33, P22243
[6]  
Chen Ting, 2019, PMLR
[7]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[8]  
Chiu S.Y., 2021, CAIP 2021 P 1 INT C, P56
[9]  
Cho S, 2020, IEEE WINT CONF APPL, P624, DOI [10.1109/WACV45572.2020.9093639, 10.1109/wacv45572.2020.9093639]
[10]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]