End-to-End Speech Recognition For Arabic Dialects

被引:8
|
作者
Nasr, Seham [1 ]
Duwairi, Rehab [2 ]
Quwaider, Muhannad [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Engn, Al Ramtha, Al Ramtha 22110, Irbid, Jordan
[2] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Al Ramtha 22110, Irbid, Jordan
关键词
Automatic speech recognition; Arabic dialectal ASR; End-to-end Arabic ASR; Yemeni ASR; Jordanian ASR; CONVOLUTIONAL NEURAL-NETWORKS; UNDER-RESOURCED LANGUAGES; SYSTEM;
D O I
10.1007/s13369-023-07670-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic speech recognition or speech-to-text is a human-machine interaction task, and although it is challenging, it is attracting several researchers and companies such as Google, Amazon, and Facebook. End-to-end speech recognition is still in its infancy for low-resource languages such as Arabic and its dialects due to the lack of transcribed corpora. In this paper, we have introduced novel transcribed corpora for Yamani Arabic, Jordanian Arabic, and multi-dialectal Arabic. We also designed several baseline sequence-to-sequence deep neural models for Arabic dialects' end-to-end speech recognition. Moreover, Mozilla's DeepSpeech2 model was trained from scratch using our corpora. The Bidirectional Long Short-Term memory (Bi-LSTM) with attention model achieved encouraging results on the Yamani speech corpus with 59% Word Error Rate (WER) and 51% Character Error Rate (CER). The Bi-LSTM with attention achieved, on the Jordanian speech corpus, 83% WER and 70% CER. By comparison, the model achieved, on the multi-dialectal Yem-Jod-Arab speech corpus, 53% WER and 39% CER. The performance of the DeepSpeech2 model has superseded the performance of the baseline models with 31% WER and 24% CER for the Yamani corpus; 68 WER and 40 CER for the Jordanian corpus. Lastly, DeepSpeech2 gave better results, on multi-dialectal Arabic corpus, with 30% WER and 20% CER.
引用
收藏
页码:10617 / 10633
页数:17
相关论文
共 50 条
  • [21] Do End-to-End Speech Recognition Models Care About Context?
    Borgholt, Lasse
    Havtorn, Jakob D.
    Agic, Zeljko
    Sogaard, Anders
    Maaloe, Lars
    Igel, Christian
    INTERSPEECH 2020, 2020, : 4352 - 4356
  • [22] Gaussian Prediction based Attention for Online End-to-End Speech Recognition
    Hou, Junfeng
    Zhang, Shiliang
    Dai, Lirong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3692 - 3696
  • [23] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26
  • [24] Integrating Lattice-Free MMI Into End-to-End Speech Recognition
    Tian, Jinchuan
    Yu, Jianwei
    Weng, Chao
    Zou, Yuexian
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 (25-38) : 25 - 38
  • [25] Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    INTERSPEECH 2019, 2019, : 76 - 80
  • [26] JOINT PHONEME-GRAPHEME MODEL FOR END-TO-END SPEECH RECOGNITION
    Kubo, Yotaro
    Bacchiani, Michiel
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6119 - 6123
  • [27] ATTENTION-BASED END-TO-END SPEECH RECOGNITION ON VOICE SEARCH
    Shan, Changhao
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4764 - 4768
  • [28] End-to-end speech recognition using lattice-free MMI
    Hadian, Hossein
    Sameti, Hossein
    Povey, Daniel
    Khudanpur, Sanjeev
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 12 - 16
  • [29] A COMPARABLE STUDY OF MODELING UNITS FOR END-TO-END MANDARIN SPEECH RECOGNITION
    Zou, Wei
    Jiang, Dongwei
    Zhao, Shuaijiang
    Yang, Guilin
    Li, Xiangang
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 369 - 373
  • [30] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596