AUTOMATED MULTI-DIALECT SPEECH RECOGNITION USING STACKED ATTENTION-BASED DEEP LEARNING WITH NATURAL LANGUAGE PROCESSING MODEL

被引:0
作者
AL Mazroa, Alanoud [1 ]
Miled, Achraf ben [2 ]
Asiri, Mashael m [3 ]
Alzahrani, Yazeed [4 ]
Sayed, Ahmed [5 ]
Nafie, Faisal mohammed [6 ]
机构
[1] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[2] Northern Border Univ, Coll Sci, Dept Comp Sci, Ar Ar 73213, Saudi Arabia
[3] Mahayil King Khalid Univ, Appl Coll, Dept Comp Sci, Abha 62521, Saudi Arabia
[4] Prince Sattam Bin Abdulaziz Univ, Coll Engn Wadi Addawasir, Dept Comp Engn, Al Kharj 16278, Saudi Arabia
[5] Future Univ Egypt, Res Ctr, New Cairo 11835, Egypt
[6] Majmaah Univ, Community Coll, Dept Nat & Appl Sci, Al Majmaah 11952, Saudi Arabia
关键词
Automatic Speech Recognition; Artificial Intelligence; Complex Systems; Fractals Harris Hawks Optimization; Deep Learning; Dialect Identification; NEURAL-NETWORKS;
D O I
10.1142/S0218348X25400304
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Dialects are language variations that occur due to differences in social groups or geographical regions. Dialect speech recognition is the approach to accurately transcribe spoken language that involves regional variation in vocabulary, syntax, and pronunciation. Models need to be trained on various dialects to handle linguistic differences effectively. The latest advancements in automatic speech recognition (ASR) and complex systems methods are showing progress in recurrent neural networks (RNN), deep neural networks (DNN), and convolutional neural networks (CNN). Multi-dialect speech recognition remains a challenge, notwithstanding the progress of deep learning (DL) in speech recognition for many computing applications in environmental modeling and smart cities. Even though the dialect-specific acoustic model is known to perform well, it is not easier to maintain when the number of dialects for all the languages is large and dialect-specific data are limited. This paper offers an Automated Multi-Dialect Speech Recognition using the Stacked Attention-based Deep Learning (MDSR-SADL) technique in environmental modeling and smart cities. The MDSR-SADL technique primarily applies the DL model to identify various dialects. In the MDSR-SADL technique, stacked long short-term memory with attention-based autoencoder (SLSTM-AAE) model is used, which integrates stack modeling with LSTM and AE. Besides, the attention model enables dialect identification by offering dialect details for speech identification. The MDSR-SADL model uses the Fractals Harris Hawks Optimization (FHHO) model for hyperparameter selection. A sequence of simulations was implemented to illustrate the improved solution of the MDSR-SADL model. The experimental investigation of the MDSR-SADL technique exhibits superior accuracy values of 99.52% and 99.55% over other techniques under Tibetan and Chinese datasets.
引用
收藏
页数:13
相关论文
共 22 条
  • [11] Kriman S, 2020, INT CONF ACOUST SPEE, P6124, DOI [10.1109/icassp40776.2020.9053889, 10.1109/ICASSP40776.2020.9053889]
  • [12] Krishna K, 2019, Arxiv, DOI arXiv:1807.06234
  • [13] Moritz N, 2020, INT CONF ACOUST SPEE, P6074, DOI [10.1109/icassp40776.2020.9054476, 10.1109/ICASSP40776.2020.9054476]
  • [14] OpenSLR, About us
  • [15] A multi-level thresholding image segmentation method using hybrid Arithmetic Optimization and Harris Hawks Optimizer algorithms
    Qiao, Li
    Liu, Kai
    Xue, Yanfeng
    Tang, Weidong
    Salehnia, Taybeh
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [16] Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition
    Sainath, Tara N.
    Weiss, Ron J.
    Wilson, Kevin W.
    Li, Bo
    Narayanan, Arun
    Variani, Ehsan
    Bacchiani, Michiel
    Shafran, Izhak
    Senior, Andrew
    Chin, Kean
    Misra, Ananya
    Kim, Chanwoo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 965 - 979
  • [17] Implementation of Solar PV-Battery and Diesel Generator Based Electric Vehicle Charging Station
    Singh, Bhim
    Verma, Anjeet
    Chandra, Ambrish
    Al-Haddad, Kamal
    [J]. IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2020, 56 (04) : 4007 - 4016
  • [18] Thanda A., 2017, arXiv
  • [19] Thandil Rizwana Kallooravi, 2023, Proceedings of Data Analytics and Management: ICDAM 2022. Lecture Notes in Networks and Systems (572), P553, DOI 10.1007/978-981-19-7615-5_46
  • [20] Vergyri D, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1652