Free-FreeSLT: A Gloss-Free Parameter-Free model for Sign Language Translation

被引:0
作者
Sun, Weirong [1 ]
Ma, Yujun [1 ]
Wang, Ruili [1 ]
机构
[1] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand
来源
PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA WORKSHOPS, MMASIA 2024 WORKSHOPS | 2024年
关键词
Sign Language Translation; Contrastive Language-Image Pre-training (CLIP); Gloss-free;
D O I
10.1145/3700410.3702115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sign language translation (SLT) is a demanding task involving integrating visual and linguistic information, requiring cross-modal learning to translate visual motions into text. Current gloss-based methods employ gloss annotations for translation. Due to the availability of annotated sign language video data, gloss-based methods rely on labor-intensive and high-quality annotation work for sign language videos. To tackle this issue, we introduce a novel two-stage gloss-free sign language translation model with a parameter-free visual-language pre-training method, enhancing visual and semantic representations without introducing extra parameters. The proposed two-stage model involves: (i) During the pre-training stage, integrating Contrastive Language-Image Pre-training (CLIP) is adopted to align visual and textual features, which are then aggregated using a mean pooling mechanism; (ii) For the fine-tuning stage, parameters from the pre-trained model are inherited to enhance sign language translation. Our proposed model surpasses the leading gloss-free SLT model on PHOENIX-2014T across various n-gram levels in the BLEU score.
引用
收藏
页数:4
相关论文
共 39 条
[1]  
Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165]
[2]   Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [J].
Camgoz, Necati Cihan ;
Koller, Oscar ;
Hadfield, Simon ;
Bowden, Richard .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10020-10030
[3]   Neural Sign Language Translation [J].
Camgoz, Necati Cihan ;
Hadfield, Simon ;
Koller, Oscar ;
Ney, Hermann ;
Bowden, Richard .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7784-7793
[4]   A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [J].
Chen, Yutong ;
Wei, Fangyun ;
Sun, Xiao ;
Wu, Zhirong ;
Lin, Stephen .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5110-5120
[5]  
Chen ZG, 2024, Arxiv, DOI arXiv:2403.12556
[6]   The Effect of Enhanced Rehabilitation Program on Upper Limb Function in Patients Undergoing Abdominal Pedicle Flap Surgery [J].
Chen, Zhiyu ;
Huang, Ling ;
Yu, Ran ;
Zhou, Yaqin ;
Tan, Jianglin .
JOURNAL OF BURN CARE & RESEARCH, 2025, 46 (02) :318-325
[7]   A review of hand gesture and sign language recognition techniques [J].
Cheok, Ming Jin ;
Omar, Zaid ;
Jaward, Mohamed Hisham .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (01) :131-153
[8]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[9]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[10]  
Fayyazsanavi P, 2024, Arxiv, DOI arXiv:2407.01394