AUTOMATED MULTI-DIALECT SPEECH RECOGNITION USING STACKED ATTENTION-BASED DEEP LEARNING WITH NATURAL LANGUAGE PROCESSING MODEL

被引:0
作者
AL Mazroa, Alanoud [1 ]
Miled, Achraf ben [2 ]
Asiri, Mashael m [3 ]
Alzahrani, Yazeed [4 ]
Sayed, Ahmed [5 ]
Nafie, Faisal mohammed [6 ]
机构
[1] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[2] Northern Border Univ, Coll Sci, Dept Comp Sci, Ar Ar 73213, Saudi Arabia
[3] Mahayil King Khalid Univ, Appl Coll, Dept Comp Sci, Abha 62521, Saudi Arabia
[4] Prince Sattam Bin Abdulaziz Univ, Coll Engn Wadi Addawasir, Dept Comp Engn, Al Kharj 16278, Saudi Arabia
[5] Future Univ Egypt, Res Ctr, New Cairo 11835, Egypt
[6] Majmaah Univ, Community Coll, Dept Nat & Appl Sci, Al Majmaah 11952, Saudi Arabia
关键词
Automatic Speech Recognition; Artificial Intelligence; Complex Systems; Fractals Harris Hawks Optimization; Deep Learning; Dialect Identification; NEURAL-NETWORKS;
D O I
10.1142/S0218348X25400304
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Dialects are language variations that occur due to differences in social groups or geographical regions. Dialect speech recognition is the approach to accurately transcribe spoken language that involves regional variation in vocabulary, syntax, and pronunciation. Models need to be trained on various dialects to handle linguistic differences effectively. The latest advancements in automatic speech recognition (ASR) and complex systems methods are showing progress in recurrent neural networks (RNN), deep neural networks (DNN), and convolutional neural networks (CNN). Multi-dialect speech recognition remains a challenge, notwithstanding the progress of deep learning (DL) in speech recognition for many computing applications in environmental modeling and smart cities. Even though the dialect-specific acoustic model is known to perform well, it is not easier to maintain when the number of dialects for all the languages is large and dialect-specific data are limited. This paper offers an Automated Multi-Dialect Speech Recognition using the Stacked Attention-based Deep Learning (MDSR-SADL) technique in environmental modeling and smart cities. The MDSR-SADL technique primarily applies the DL model to identify various dialects. In the MDSR-SADL technique, stacked long short-term memory with attention-based autoencoder (SLSTM-AAE) model is used, which integrates stack modeling with LSTM and AE. Besides, the attention model enables dialect identification by offering dialect details for speech identification. The MDSR-SADL model uses the Fractals Harris Hawks Optimization (FHHO) model for hyperparameter selection. A sequence of simulations was implemented to illustrate the improved solution of the MDSR-SADL model. The experimental investigation of the MDSR-SADL technique exhibits superior accuracy values of 99.52% and 99.55% over other techniques under Tibetan and Chinese datasets.
引用
收藏
页数:13
相关论文
共 22 条
  • [1] Abdelazim M., 2022, Int. J. Intell. Computing Inf. Sci., V22, P25
  • [2] Ali Mushir., 2012, INT J SCI RES PUBLIC, V2, P1
  • [3] Bhardwaj Vivek, 2023, Ingenierie des systemes d'information, V28, P1557, DOI 10.18280/isi.280612
  • [4] Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition
    Chen, Dongpeng
    Mak, Brian Kan-Wing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1172 - 1183
  • [5] commonvoice.mozilla, Common Voice
  • [6] Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition
    Dan, Zhengjia
    Zhao, Yue
    Bi, Xiaojun
    Wu, Licheng
    Ji, Qiang
    [J]. ENTROPY, 2022, 24 (10)
  • [7] MULTI-DIALECT SPEECH RECOGNITION IN ENGLISH USING ATTENTION ON ENSEMBLE OF EXPERTS
    Das, Amit
    Kumar, Kshitiz
    Wu, Jian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6244 - 6248
  • [8] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
  • [9] Humayun M. A., 2023, IAES Int. J. Artif. Intell., V12, P739
  • [10] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
    Jain, Abhinav
    Upreti, Minali
    Jyothi, Preethi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458