Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition

被引：1

作者：

Papadimitriou, Katerina ^{[1
]}

Potamianos, Gerasimos ^{[1
]}

机构：

[1] Univ Thessaly, Dept Elect & Comp Engn, Volos, Greece

来源：

INTERSPEECH 2023 | 2023年

关键词：

continuous sign language recognition; RNN; Transformer; RWTH-PHOENIX Weather 2014; RWTH-PHOENIX Weather 2014T;

D O I：

10.21437/Interspeech.2023-2198

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose a novel Transformer-based approach for continuous sign language recognition (CSLR) from videos, aiming to address the shortcomings of traditional Transformers in learning local semantic context of SL. Specifically, the proposed relies on two distinct components: (a) a window-based RNN module to capture local temporal context and (b) a Transformer encoder, enhanced with local modeling via Gaussian bias and relative position information, as well as with global structure modeling through multi-head attention. To further improve model performance, we design a multimodal framework that applies the proposed to both appearance and motion signing streams, aligning their posteriors through a guiding CTC technique. Further, we achieve visual feature and gloss sequence alignment by incorporating a knowledge distillation loss. Experimental evaluation on two popular German CSLR datasets, demonstrates the superiority of our model.

引用

页码：1513 / 1517

页数：5

共 50 条

[1] American Sign Language Recognition Using a Multimodal Transformer Network
Hafeez, Khalid Abdel
Massoud, Mazen
Menegotti, Thomas
Tannous, Johnathon
Wedge, Sarah
2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 654 - 659
[2] A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition
Javaid, Sameena
Rizvi, Safdar
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 523 - 537
[3] SLRFormer: Continuous Sign Language Recognition Based on Vision Transformer
Xiao, Feng
Liu, Ruyu
Yuan, Tiantian
Fan, Zhimin
Wang, Jiajia
Zhang, Jianhua
2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
[4] Continuous Sign Language Recognition Based on CM-Transformer
Ye K.
Zhang S.
Guo Q.
Li H.
Cui X.
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2022, 45 (05): : 49 - 53and78
[5] Sign Language Recognition with Transformer Networks
De Coster, Mathieu
Van Herreweghe, Mieke
Dambre, Joni
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6018 - 6024
[6] Multimodal Learning for Sign Language Recognition
Ferreira, Pedro M.
Cardoso, Jaime S.
Rebelo, Ana
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 313 - 321
[7] Multimodal continuous recognition system for Greek Sign Language using various grammars
Vassilia, Paschaloudi N.
Konstantinos, Margaritis G.
ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 584 - 587
[8] Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition
Yin, Wenjie
Hou, Yonghong
Guo, Zihui
Liu, Kailin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1684 - 1695
[9] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
Huang, Jian
Tao, Jianhua
Liu, Bin
Lian, Zheng
Niu, Mingyue
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
[10] SIGNFORMER: DeepVision Transformer for Sign Language Recognition
Kothadiya, Deep R.
Bhatt, Chintan M.
Saba, Tanzila
Rehman, Amjad
Bahaj, Saeed Ali
IEEE ACCESS, 2023, 11 : 4730 - 4739

← 1 2 3 4 5 →