Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer

被引：6

作者：

Li, Jicheng ^{[1
]}

Bhat, Anjana ^{[1
]}

Barmaki, Roghayeh ^{[1
]}

机构：

[1] Univ Delaware, Newark, DE 19716 USA

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022 | 2022年

关键词：

deep learning; movement synchrony estimation; contrastive learning; transformer networks; knowledge distillation; autism spectrum disorder; NEURAL-NETWORKS; DATASETS;

D O I：

10.1145/3536221.3556627

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The concept of movement synchrony is derived from the scientifc study of interacting dyads in the autism feld. Automated movement synchrony estimation has been achieved by utilizing deep learning models applied to other tasks, such as human activity recognition. To better adapt to the movement synchrony estimation task, we proposed a skeleton-based uncertainty-aware graph transformer incorporating joint confdence scores. We uniquely designed a joint position embedding shared between the same joints of interacting individuals and introduced a temporal similarity matrix in temporal attention computation considering the periodic intrinsic of body movements. To further improve the performance, we constructed a dataset for movement synchrony estimation using Human3.6M and pretrained our model on it via contrastive learning. We further applied knowledge distillation to alleviate information loss introduced by pose detector failure in a privacy-preserving way. Our method achieved an overall accuracy of 88.98% on PT13, a dataset collected from autism therapy interventions, and surpassed its counterpart approaches by a good margin. This work also has implications for synchronous movement activity recognition in group settings, with broad applications in education and sports.

引用

页码：73 / 82

页数：10

共 66 条

[21] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[22] Hinton G., 2015, ARXIV, V2
[23] Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
Ionescu, Catalin
Papava, Dragos
Olaru, Vlad
Sminchisescu, Cristian
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) : 1325 - 1339
[24] Jibin Gao, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12375), P222, DOI 10.1007/978-3-030-58577-8_14
[25] Jicheng Li, 2021, ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, P397, DOI 10.1145/3462244.3479891
[26] Momentum Contrast for Unsupervised Visual Representation Learning
He, Kaiming
Fan, Haoqi
Wu, Yuxin
Xie, Saining
Girshick, Ross
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9726 - 9735
[27] Kay W, 2017, Arxiv, DOI arXiv:1705.06950
[28] Transformers in Vision: A Survey
Khan, Salman
Naseer, Muzammal
Hayat, Munawar
Zamir, Syed Waqas
Khan, Fahad Shahbaz
Shah, Mubarak
[J]. ACM COMPUTING SURVEYS, 2022, 54 (10S)
[29] Kingma D P., 2014, P INT C LEARN REPR
[30] Kipf T. N., 2017, 5 INT C LEARN REPR I

← 1 2 3 4 5 6 7 →