Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer

被引:6
作者
Li, Jicheng [1 ]
Bhat, Anjana [1 ]
Barmaki, Roghayeh [1 ]
机构
[1] Univ Delaware, Newark, DE 19716 USA
来源
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022 | 2022年
关键词
deep learning; movement synchrony estimation; contrastive learning; transformer networks; knowledge distillation; autism spectrum disorder; NEURAL-NETWORKS; DATASETS;
D O I
10.1145/3536221.3556627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The concept of movement synchrony is derived from the scientifc study of interacting dyads in the autism feld. Automated movement synchrony estimation has been achieved by utilizing deep learning models applied to other tasks, such as human activity recognition. To better adapt to the movement synchrony estimation task, we proposed a skeleton-based uncertainty-aware graph transformer incorporating joint confdence scores. We uniquely designed a joint position embedding shared between the same joints of interacting individuals and introduced a temporal similarity matrix in temporal attention computation considering the periodic intrinsic of body movements. To further improve the performance, we constructed a dataset for movement synchrony estimation using Human3.6M and pretrained our model on it via contrastive learning. We further applied knowledge distillation to alleviate information loss introduced by pose detector failure in a privacy-preserving way. Our method achieved an overall accuracy of 88.98% on PT13, a dataset collected from autism therapy interventions, and surpassed its counterpart approaches by a good margin. This work also has implications for synchronous movement activity recognition in group settings, with broad applications in education and sports.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 66 条
  • [61] Yan SJ, 2018, AAAI CONF ARTIF INTE, P7444
  • [62] Yonglong Tian, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12356), P776, DOI 10.1007/978-3-030-58621-8_45
  • [63] Zhang Guo, 2021, ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, P444, DOI 10.1145/3462244.3479882
  • [64] Zhang PF, 2017, IEEE I CONF COMP VIS, P2136, DOI [10.1109/ICCV.2017.233, 10.1109/ICCV.2017.231]
  • [65] 3D Human Pose Estimation with Spatial and Temporal Transformers
    Zheng, Ce
    Zhu, Sijie
    Mendieta, Matias
    Yang, Taojiannan
    Chen, Chen
    Ding, Zhengming
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11636 - 11645
  • [66] Zhuang D., 2022, arXiv