Vision Transformers and Transfer Learning Approaches for Arabic Sign Language Recognition

被引：12

作者：

Alharthi, Nojood M. ^{[1
]}

Alzahrani, Salha M. ^{[1
]}

机构：

[1] Taif Univ, Coll Comp & Informat Technol, Dept Comp Sci, POB 11099, Taif 21944, Saudi Arabia

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 21期

关键词：

Arabic sign language; transfer learning; VGG; ResNet; MobileNet; Xception; Inception; DenseNet; InceptionResNet; ViT; Swin; BiT; GESTURE RECOGNITION;

D O I：

10.3390/app132111625

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Sign languages are complex, but there are ongoing research efforts in engineering and data science to recognize, understand, and utilize them in real-time applications. Arabic sign language recognition (ArSL) has been examined and applied using various traditional and intelligent methods. However, there have been limited attempts to enhance this process by utilizing pretrained models and large-sized vision transformers designed for image classification tasks. This study aimed to create robust transfer learning models trained on a dataset of 54,049 images depicting 32 alphabets from an ArSL dataset. The goal was to accurately classify these images into their corresponding Arabic alphabets. This study included two methodological parts. The first one was the transfer learning approach, wherein we utilized various pretrained models namely MobileNet, Xception, Inception, InceptionResNet, DenseNet, and BiT, and two vision transformers namely ViT, and Swin. We evaluated different variants from base-sized to large-sized pretrained models and vision transformers with weights initialized from the ImageNet dataset or otherwise randomly. The second part was the deep learning approach using convolutional neural networks (CNNs), wherein several CNN architectures were trained from scratch to be compared with the transfer learning approach. The proposed methods were evaluated using the accuracy, AUC, precision, recall, F1 and loss metrics. The transfer learning approach consistently performed well on the ArSL dataset and outperformed other CNN models. ResNet and InceptionResNet obtained a comparably high performance of 98%. By combining the concepts of transformer-based architecture and pretraining, ViT and Swin leveraged the strengths of both architectures and reduced the number of parameters required for training, making them more efficient and stable than other models and existing studies for ArSL classification. This demonstrates the effectiveness and robustness of using transfer learning with vision transformers for sign language recognition for other low-resourced languages.

引用

页数：28

共 78 条

[1]

Abdallah M., 2013, Glob. J. Comput. Sci. Technol. Graph. Vis, V13, P26

[2] American Sign Language Words Recognition Using Spatio-Temporal Prosodic and Angle Features: A Sequential Learning Approach [J].

Abdullahi, Sunusi Bala ;

Chamnongthai, Kosin .

IEEE ACCESS, 2022, 10 :15911-15923

[3] Machine learning methods for sign language recognition: A critical review and analysis [J].

Adeyanju, I. A. ;

Bello, O. O. ;

Adegboye, M. A. .

INTELLIGENT SYSTEMS WITH APPLICATIONS, 2021, 12

[4]

Adithya V, 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), P1080

[5]

Aich D., 2020, P 11 INT C COMPUTING, P1

[6]

Al Khalissi R., 2020, A Real-Time American Sign Language Recognition System Using Convolutional Neural Network for Real Datasets, VVolume 9

[7] Arabic Sign Language Recognition Using Deep Learning Models [J].

Al-Barham, Muhammad ;

Abu Sa'aleek, Ahmad ;

Al-Odat, Mohammad ;

Hamad, Ghada ;

Al-Yaman, Musa ;

Elnagar, Ashraf .

2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, :226-231

[8]

Al-Obodi AH, 2020, International Journal of Engineering Research and Technology, V13, P3328, DOI [10.37624/ijert/13.11.2020.3328-3334, 10.37624/IJERT/13.11.2020.3328-3334, DOI 10.37624/IJERT/13.11.2020.3328-3334]

[9] Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues [J].

Al-Qurishi, Muhammad ;

Khalid, Thariq ;

Souissi, Riad .

IEEE ACCESS, 2021, 9 :126917-126951

[10] AUTOMATIC ARABIC SIGN LANGUAGE RECOGNITION: A REVIEW, TAXONOMY, OPEN CHALLENGES, RESEARCH ROADMAP AND FUTURE DIRECTIONS [J].

Al-Shamayleh, Ahmad Sami ;

Ahmad, Rodina ;

Jomhari, Nazean ;

Abushariah, Mohammad A. M. .

MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, 33 (04) :306-343

← 1 2 3 4 5 6 7 8 →