Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition

被引:1
|
作者
Papadimitriou, Katerina [1 ]
Potamianos, Gerasimos [1 ]
机构
[1] Univ Thessaly, Dept Elect & Comp Engn, Volos, Greece
来源
INTERSPEECH 2023 | 2023年
关键词
continuous sign language recognition; RNN; Transformer; RWTH-PHOENIX Weather 2014; RWTH-PHOENIX Weather 2014T;
D O I
10.21437/Interspeech.2023-2198
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a novel Transformer-based approach for continuous sign language recognition (CSLR) from videos, aiming to address the shortcomings of traditional Transformers in learning local semantic context of SL. Specifically, the proposed relies on two distinct components: (a) a window-based RNN module to capture local temporal context and (b) a Transformer encoder, enhanced with local modeling via Gaussian bias and relative position information, as well as with global structure modeling through multi-head attention. To further improve model performance, we design a multimodal framework that applies the proposed to both appearance and motion signing streams, aligning their posteriors through a guiding CTC technique. Further, we achieve visual feature and gloss sequence alignment by incorporating a knowledge distillation loss. Experimental evaluation on two popular German CSLR datasets, demonstrates the superiority of our model.
引用
收藏
页码:1513 / 1517
页数:5
相关论文
共 50 条
  • [1] American Sign Language Recognition Using a Multimodal Transformer Network
    Hafeez, Khalid Abdel
    Massoud, Mazen
    Menegotti, Thomas
    Tannous, Johnathon
    Wedge, Sarah
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 654 - 659
  • [2] A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition
    Javaid, Sameena
    Rizvi, Safdar
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 523 - 537
  • [3] SLRFormer: Continuous Sign Language Recognition Based on Vision Transformer
    Xiao, Feng
    Liu, Ruyu
    Yuan, Tiantian
    Fan, Zhimin
    Wang, Jiajia
    Zhang, Jianhua
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
  • [4] Continuous Sign Language Recognition Based on CM-Transformer
    Ye K.
    Zhang S.
    Guo Q.
    Li H.
    Cui X.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2022, 45 (05): : 49 - 53and78
  • [5] Sign Language Recognition with Transformer Networks
    De Coster, Mathieu
    Van Herreweghe, Mieke
    Dambre, Joni
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6018 - 6024
  • [6] Multimodal Learning for Sign Language Recognition
    Ferreira, Pedro M.
    Cardoso, Jaime S.
    Rebelo, Ana
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 313 - 321
  • [7] Multimodal continuous recognition system for Greek Sign Language using various grammars
    Vassilia, Paschaloudi N.
    Konstantinos, Margaritis G.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 584 - 587
  • [8] Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition
    Yin, Wenjie
    Hou, Yonghong
    Guo, Zihui
    Liu, Kailin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1684 - 1695
  • [9] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [10] SIGNFORMER: DeepVision Transformer for Sign Language Recognition
    Kothadiya, Deep R.
    Bhatt, Chintan M.
    Saba, Tanzila
    Rehman, Amjad
    Bahaj, Saeed Ali
    IEEE ACCESS, 2023, 11 : 4730 - 4739