Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language Recognition

被引:0
|
作者
Zhao, Weichao [1 ]
Zhou, Wengang [1 ]
Hu, Hezhen [2 ]
Wang, Min [3 ]
Li, Houqiang [1 ]
机构
[1] Univ Sci & Technol China, MoE Key Lab Brain Inspired Intelligent Percept & C, Hefei 230027, Peoples R China
[2] Univ Texas Austin, Visual Informat Grp, Austin, TX 78705 USA
[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
Sign language; Task analysis; Semantics; Representation learning; Knowledge transfer; Feature extraction; Skeleton; Sign language recognition; skeleton-based; self-supervised learning; contrastive learning;
D O I
10.1109/TIP.2024.3416881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, there have been efforts to improve the performance in sign language recognition by designing self-supervised learning methods. However, these methods capture limited information from sign pose data in a frame-wise learning manner, leading to sub-optimal solutions. To this end, we propose a simple yet effective self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency from two distinct perspectives and learn instance discriminative representation for sign language recognition. On one hand, since the semantics of sign language are expressed by the cooperation of fine-grained hands and coarse-grained trunks, we utilize both granularity information and encode them into latent spaces. The consistency between hand and trunk features is constrained to encourage learning consistent representation of instance samples. On the other hand, inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling. Additionally, we further bridge the interaction between the embedding spaces of both modalities, facilitating bidirectional knowledge transfer to enhance sign language representation. Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin. The source code is publicly available at https://github.com/sakura/Code.
引用
收藏
页码:4188 / 4201
页数:14
相关论文
共 50 条
  • [1] Attentive spatial-temporal contrastive learning for self-supervised video representation
    Yang, Xingming
    Xiong, Sixuan
    Wu, Kewei
    Shan, Dongfeng
    Xie, Zhao
    IMAGE AND VISION COMPUTING, 2023, 137
  • [2] SSRL: Self-Supervised Spatial-Temporal Representation Learning for 3D Action Recognition
    Jin, Zhihao
    Wang, Yifan
    Wang, Qicong
    Shen, Yehu
    Meng, Hongying
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 274 - 285
  • [3] Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning
    Zhang, Zehua
    Crandall, David
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 975 - 985
  • [4] Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction
    Li, Zhonghang
    Huang, Chao
    Xia, Lianghao
    Xu, Yong
    Pei, Jian
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2984 - 2996
  • [5] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Bi, Shuai
    Hu, Zhengping
    Zhao, Mengyao
    Li, Shufang
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1485 - 1492
  • [6] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Shuai Bi
    Zhengping Hu
    Mengyao Zhao
    Shufang Li
    Zhe Sun
    Signal, Image and Video Processing, 2023, 17 : 1485 - 1492
  • [7] Spatial and temporal features unified self-supervised representation learning networks
    Choudhary, Rahul
    Walambe, Rahee
    Kotecha, Ketan
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 157
  • [8] Self-supervised Spatial-Temporal Normality Learning for Time Series Anomaly Detection
    Chen, Yutong
    Xu, Hongzuo
    Pang, Guansong
    Qiao, Hezhe
    Zhou, Yuan
    Shang, Mingsheng
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK, PT VI, ECML PKDD 2024, 2024, 14946 : 145 - 162
  • [9] Spatial-Temporal Consistency Constraints for Chinese Sign Language Synthesis
    Gao, Liqing
    Liu, Peidong
    Wan, Liang
    Feng, Wei
    COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS, CAD/GRAPHICS 2023, 2024, 14250 : 154 - 169
  • [10] Self-Supervised Image Representation Learning with Geometric Set Consistency
    Chen, Nenglun
    Chu, Lei
    Pan, Hao
    Lu, Yan
    Wang, Wenping
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19270 - 19280