Improving Sign Language Recognition Performance Using Multimodal Data

被引:0
作者
Nishimura, Tomoe [1 ]
Abbasi, Bahareh [1 ]
机构
[1] Calif State Univ Channel Isl, Comp Sci, Camarillo, CA 93012 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI 2024 | 2024年
关键词
sign language; computer vision; transformer; Mediapipe; multimodal;
D O I
10.1109/IRI62200.2024.00047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sign language is a language primarily used by the hearing-impaired for communication and has more than 200 variations worldwide. Communication is nearly impossible between signers of different variations. Moreover for a person with normal hearing, learning sign language can be challenging because the syntax of sign language differs from that of natural language. Translation of signs by machine learning offers potential solutions to these challenges, facilitating communication for everyone. This study attempts to enhance the performance of the existing state-of-the-art sign language translation model, Gloss attention SLT network (GASLT), through the integration of a multimodal approach. By combining RGB video with 3D pose data extracted using Mediapipe in an innovative way, our multimodal method significantly enhances the GASLT's results. We conducted two experiments involving the fusion of video and pose data with the GASLT model. These experiments led to an 18.39% improvement in the model's BLEU score compared to the original model, showcasing the effectiveness of the multimodal approach in enhancing sign translation.
引用
收藏
页码:184 / 189
页数:6
相关论文
共 23 条
[1]  
Camgoz Necati Cihan, 2020, Sign language transformers: Joint end-to-end sign language recognition and translation
[2]  
Cao Z., 2019, OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]   A review of hand gesture and sign language recognition techniques [J].
Cheok, Ming Jin ;
Omar, Zaid ;
Jaward, Mohamed Hisham .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (01) :131-153
[5]  
Cho K., 2014, C EMP METH NAT LANG, P1724, DOI [10.3115/v1/d14-1179, DOI 10.3115/V1/D14-1179]
[6]  
Gandhi D., 2022, 2022 13 INT C COMP C, P1
[7]  
Gyawali D., 2023, Comparative analysis of cpu and gpu profiling for deep learning models
[8]  
Huang J., 2015, IEEE INT C MULTIMEDI, P1
[9]   Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers [J].
Koller, Oscar ;
Forster, Jens ;
Ney, Hermann .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 141 :108-125
[10]   Real-time sign language recognition using a consumer depth camera [J].
Kuznetsova, Alina ;
Leal-Taixe, Laura ;
Rosenhahn, Bodo .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, :83-90