StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

被引:7
|
作者
Shen, Xiaolong [1 ]
Zheng, Zhedong [2 ]
Yang, Yi [1 ]
机构
[1] Zhejiang Univ, Hangzhou 310013, Zhejiang, Peoples R China
[2] Univ Macau, Taipa 999078, Macao, Peoples R China
基金
中国国家自然科学基金;
关键词
Sign language recognition; video analysis; MODEL;
D O I
10.1145/3656046
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The goal of sign language recognition (SLR) is to help those who are hard of hearing or deaf overcome the communication barrier. Most existing approaches can be typically divided into two lines, i.e., Skeleton-based, and RGB-based methods, but both lines of methods have their limitations. Skeleton-based methods do not consider facial expressions, while RGB-based approaches usually ignore the fine-grained hand structure. To overcome both limitations, we propose a new framework called the Spatial-temporal Part-aware network (StepNet), based on RGB parts. As its name suggests, it is made up of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling. Part-level Spatial Modeling, in particular, automatically captures the appearance-based properties, such as hands and faces, in the feature space without the use of any keypoint-level annotations. On the other hand, Part-level Temporal Modeling implicitly mines the long short-term context to capture the relevant attributes over time. Extensive experiments demonstrate that our StepNet, thanks to spatial-temporal modules, achieves competitive Top-1 Per-instance accuracy on three commonly used SLR benchmarks, i.e., 56.89% on WLASL, 77.2% on NMFs-CSL, and 77.1% on BOBSL. Additionally, the proposed method is compatible with the optical flow input and can produce superior performance if fused. For those who are hard of hearing, we hope that our work can act as a preliminary step.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition
    Yin, Wenjie
    Hou, Yonghong
    Guo, Zihui
    Liu, Kailin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1684 - 1695
  • [2] Structure-aware sign language recognition with spatial-temporal scene graph
    Lin, Shiquan
    Xiao, Zhengye
    Wang, Lixin
    Wan, Xiuan
    Ni, Lan
    Fang, Yuchun
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (06)
  • [3] Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition
    Zhou, Hao
    Zhou, Wengang
    Zhou, Yun
    Li, Hougiang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13009 - 13016
  • [4] Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation
    Zhou, Hao
    Zhou, Wengang
    Zhou, Yun
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 768 - 779
  • [5] Continuous Sign Language Recognition Based on Spatial-Temporal Graph Attention Network
    Guo, Qi
    Zhang, Shujun
    Li, Hui
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 134 (03): : 1653 - 1670
  • [6] Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition
    de Amorim, Cleison Correia
    Macedo, David
    Zanchettin, Cleber
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 646 - 657
  • [7] Isolated Sign Language Recognition with Multi-Scale Spatial-Temporal Graph Convolutional Networks
    Vazquez-Enriquez, Manuel
    Alba-Castro, Jose L.
    Docio-Fernandez, Laura
    Rodriguez-Banga, Eduardo
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3457 - 3466
  • [8] Spatial-temporal attention with graph and general neural network-based sign language recognition
    Miah, Abu Saleh Musa
    Hasan, Md. Al Mehedi
    Okuyama, Yuichi
    Tomioka, Yoichi
    Shin, Jungpil
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)
  • [9] Spatial-temporal transformer for end-to-end sign language recognition
    Cui, Zhenchao
    Zhang, Wenbo
    Li, Zhaoxin
    Wang, Zhaoqi
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (04) : 4645 - 4656
  • [10] Sign Language Recognition Based on Spatial-Temporal Graph Convolution-Transformer
    Takayama, Natsuki
    Benitez-Garcia, Gibran
    Takahashi, Hiroki
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2021, 87 (12): : 1028 - 1035