End-to-End Speech Recognition For Arabic Dialects

被引:8
|
作者
Nasr, Seham [1 ]
Duwairi, Rehab [2 ]
Quwaider, Muhannad [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Engn, Al Ramtha, Al Ramtha 22110, Irbid, Jordan
[2] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Al Ramtha 22110, Irbid, Jordan
关键词
Automatic speech recognition; Arabic dialectal ASR; End-to-end Arabic ASR; Yemeni ASR; Jordanian ASR; CONVOLUTIONAL NEURAL-NETWORKS; UNDER-RESOURCED LANGUAGES; SYSTEM;
D O I
10.1007/s13369-023-07670-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic speech recognition or speech-to-text is a human-machine interaction task, and although it is challenging, it is attracting several researchers and companies such as Google, Amazon, and Facebook. End-to-end speech recognition is still in its infancy for low-resource languages such as Arabic and its dialects due to the lack of transcribed corpora. In this paper, we have introduced novel transcribed corpora for Yamani Arabic, Jordanian Arabic, and multi-dialectal Arabic. We also designed several baseline sequence-to-sequence deep neural models for Arabic dialects' end-to-end speech recognition. Moreover, Mozilla's DeepSpeech2 model was trained from scratch using our corpora. The Bidirectional Long Short-Term memory (Bi-LSTM) with attention model achieved encouraging results on the Yamani speech corpus with 59% Word Error Rate (WER) and 51% Character Error Rate (CER). The Bi-LSTM with attention achieved, on the Jordanian speech corpus, 83% WER and 70% CER. By comparison, the model achieved, on the multi-dialectal Yem-Jod-Arab speech corpus, 53% WER and 39% CER. The performance of the DeepSpeech2 model has superseded the performance of the baseline models with 31% WER and 24% CER for the Yamani corpus; 68 WER and 40 CER for the Jordanian corpus. Lastly, DeepSpeech2 gave better results, on multi-dialectal Arabic corpus, with 30% WER and 20% CER.
引用
收藏
页码:10617 / 10633
页数:17
相关论文
共 50 条
  • [31] End-to-End Myanmar Speech Recognition with Human-Machine Cooperation
    Wang, Faliang
    Yang, Yiling
    Yang, Jian
    2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 156 - 161
  • [32] Exploring end-to-end framework towards Khasi speech recognition system
    Syiem, Bronson
    Singh, L. Joyprakash
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 419 - 424
  • [33] AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION
    Inaguma, Hirofumi
    Mimura, Masato
    Inoue, Koji
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6214 - 6218
  • [34] SFA: Searching faster architectures for end-to-end automatic speech recognition models
    Liu, Yukun
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    COMPUTER SPEECH AND LANGUAGE, 2023, 81
  • [35] Development of CRF and CTC Based End-To-End Kazakh Speech Recognition System
    Oralbekova, Dina
    Mamyrbayev, Orken
    Othman, Mohamed
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Nuranbayeva, Bulbul
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 519 - 531
  • [36] Attention-Based End-to-End Named Entity Recognition from Speech
    Porjazovski, Dejan
    Leinonen, Juho
    Kurimo, Mikko
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 469 - 480
  • [37] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
    Yamini, Shaarada D.
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    Purini, Suresh
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100
  • [38] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [39] Improved training strategies for end-to-end speech recognition in digital voice assistants
    Tulsiani, Hitesh
    Sapru, Ashtosh
    Arsikere, Harish
    Punjabi, Surabhi
    Garimella, Sri
    INTERSPEECH 2020, 2020, : 2792 - 2796
  • [40] LWMD: A Comprehensive Compression Platform for End-to-End Automatic Speech Recognition Models
    Liu, Yukun
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    APPLIED SCIENCES-BASEL, 2023, 13 (03):