Free-FreeSLT: A Gloss-Free Parameter-Free model for Sign Language Translation

被引：0

作者：

Sun, Weirong ^{[1
]}

Ma, Yujun ^{[1
]}

Wang, Ruili ^{[1
]}

机构：

[1] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand

来源：

PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA WORKSHOPS, MMASIA 2024 WORKSHOPS | 2024年

关键词：

Sign Language Translation; Contrastive Language-Image Pre-training (CLIP); Gloss-free;

D O I：

10.1145/3700410.3702115

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sign language translation (SLT) is a demanding task involving integrating visual and linguistic information, requiring cross-modal learning to translate visual motions into text. Current gloss-based methods employ gloss annotations for translation. Due to the availability of annotated sign language video data, gloss-based methods rely on labor-intensive and high-quality annotation work for sign language videos. To tackle this issue, we introduce a novel two-stage gloss-free sign language translation model with a parameter-free visual-language pre-training method, enhancing visual and semantic representations without introducing extra parameters. The proposed two-stage model involves: (i) During the pre-training stage, integrating Contrastive Language-Image Pre-training (CLIP) is adopted to align visual and textual features, which are then aggregated using a mean pooling mechanism; (ii) For the fine-tuning stage, parameters from the pre-trained model are inherited to enhance sign language translation. Our proposed model surpasses the leading gloss-free SLT model on PHOENIX-2014T across various n-gram levels in the BLEU score.

引用

页数：4

共 39 条

[1]

Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165]

[2] Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [J].

Camgoz, Necati Cihan ;

Koller, Oscar ;

Hadfield, Simon ;

Bowden, Richard .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10020-10030

[3] Neural Sign Language Translation [J].

Camgoz, Necati Cihan ;

Hadfield, Simon ;

Koller, Oscar ;

Ney, Hermann ;

Bowden, Richard .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7784-7793

[4] A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [J].

Chen, Yutong ;

Wei, Fangyun ;

Sun, Xiao ;

Wu, Zhirong ;

Lin, Stephen .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5110-5120

[5]

Chen ZG, 2024, Arxiv, DOI arXiv:2403.12556

[6] The Effect of Enhanced Rehabilitation Program on Upper Limb Function in Patients Undergoing Abdominal Pedicle Flap Surgery [J].

Chen, Zhiyu ;

Huang, Ling ;

Yu, Ran ;

Zhou, Yaqin ;

Tan, Jianglin .

JOURNAL OF BURN CARE & RESEARCH, 2025, 46 (02) :318-325

[7] A review of hand gesture and sign language recognition techniques [J].

Cheok, Ming Jin ;

Omar, Zaid ;

Jaward, Mohamed Hisham .

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (01) :131-153

[8]

Devlin J, 2019, Arxiv, DOI arXiv:1810.04805

[9]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[10]

Fayyazsanavi P, 2024, Arxiv, DOI arXiv:2407.01394

← 1 2 3 4 →