Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer

被引:6
作者
Li, Jicheng [1 ]
Bhat, Anjana [1 ]
Barmaki, Roghayeh [1 ]
机构
[1] Univ Delaware, Newark, DE 19716 USA
来源
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022 | 2022年
关键词
deep learning; movement synchrony estimation; contrastive learning; transformer networks; knowledge distillation; autism spectrum disorder; NEURAL-NETWORKS; DATASETS;
D O I
10.1145/3536221.3556627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The concept of movement synchrony is derived from the scientifc study of interacting dyads in the autism feld. Automated movement synchrony estimation has been achieved by utilizing deep learning models applied to other tasks, such as human activity recognition. To better adapt to the movement synchrony estimation task, we proposed a skeleton-based uncertainty-aware graph transformer incorporating joint confdence scores. We uniquely designed a joint position embedding shared between the same joints of interacting individuals and introduced a temporal similarity matrix in temporal attention computation considering the periodic intrinsic of body movements. To further improve the performance, we constructed a dataset for movement synchrony estimation using Human3.6M and pretrained our model on it via contrastive learning. We further applied knowledge distillation to alleviate information loss introduced by pose detector failure in a privacy-preserving way. Our method achieved an overall accuracy of 88.98% on PT13, a dataset collected from autism therapy interventions, and surpassed its counterpart approaches by a good margin. This work also has implications for synchronous movement activity recognition in group settings, with broad applications in education and sports.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 66 条
  • [21] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [22] Hinton G., 2015, ARXIV, V2
  • [23] Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
    Ionescu, Catalin
    Papava, Dragos
    Olaru, Vlad
    Sminchisescu, Cristian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) : 1325 - 1339
  • [24] Jibin Gao, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12375), P222, DOI 10.1007/978-3-030-58577-8_14
  • [25] Jicheng Li, 2021, ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, P397, DOI 10.1145/3462244.3479891
  • [26] Momentum Contrast for Unsupervised Visual Representation Learning
    He, Kaiming
    Fan, Haoqi
    Wu, Yuxin
    Xie, Saining
    Girshick, Ross
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9726 - 9735
  • [27] Kay W, 2017, Arxiv, DOI arXiv:1705.06950
  • [28] Transformers in Vision: A Survey
    Khan, Salman
    Naseer, Muzammal
    Hayat, Munawar
    Zamir, Syed Waqas
    Khan, Fahad Shahbaz
    Shah, Mubarak
    [J]. ACM COMPUTING SURVEYS, 2022, 54 (10S)
  • [29] Kingma D P., 2014, P INT C LEARN REPR
  • [30] Kipf T. N., 2017, 5 INT C LEARN REPR I