INTELLIGENT SPEECH RECOGNITION USING FRACTAL AMENDED GRASSHOPPER OPTIMIZATION ALGORITHM WITH DEEP LEARNING APPROACH

被引:0
作者
Al-Anazi, Reema G. [1 ]
Al-Dobaian, Abdullah Saad [2 ]
Hassan, Asma Abbas [3 ]
Almanea, Manar [4 ]
Alghamdi, Ayman Ahmad [5 ]
Asklany, Somia A. [6 ]
Al Sultan, Hanan [7 ]
Majdoubi, Jihen [8 ]
机构
[1] Princess Nourah bint Abdulrahman Univ, Coll Humanities & Social Sci, Dept Arab Language & Literature, POB 84428, Riyadh 11671, Saudi Arabia
[2] King Saud Univ, Coll Language Sci, Dept English Language, POB 145111, Riyadh, Saudi Arabia
[3] King Khalid Univ, Appl Coll Mahayil, Comp Sci Dept, Muhayel Aseer 62529, Saudi Arabia
[4] Imam Mohammad Ibn Saud Islamic Univ, Coll Languages & Translat, Dept English, Riyadh 11432, Saudi Arabia
[5] Umm Al qura Univ, Arab Language Inst, Dept Arab Teaching, Mecca, Saudi Arabia
[6] Turaif Northern Border Univ, Fac Sci & Arts, Dept Comp Sci & Informat Technol, Ar Ar 91431, Saudi Arabia
[7] King Faisal Univ, Dept English, Coll Arts, Al Hufuf, Saudi Arabia
[8] Majmaah Univ, Coll Sci & Humanities Alghat, Dept Comp Sci, Al Majmaah 11952, Saudi Arabia
关键词
Speech Recognition; Fractals Amended Grasshopper Optimization; Speech Signals; Deep Learning; Human-To-Machine Interaction; Brain-Like Computing Applications; NETWORK; MODEL;
D O I
10.1142/S0218348X25400298
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Humans prefer to convey information through speech utilizing similar language. Speech detection is the capability to recognize the spoken words of the speaking person. Recent work demonstrates the increased attention among researcher workers in this field specially in Brain-Like computing applications and emphasizes the real-world usability of speech for speaker recognition across different applications. Automatic speech recognition (ASR) is the method of identifying human speech and converting it into text. This study has gained much popularity in recent times. It is a crucial area of research for human-to-machine interaction. Pioneer methods are concerned with manual feature extraction and classical algorithms including Hidden Markov Models (HMM), Gaussian Mixture Model (GMM), and the Dynamic Time Warping (DTW) model. In recent years, neural networks, namely convolutional neural networks (CNN), recurrent neural networks (RNN), and Transformers, have been utilized in the context of ASR and reached outstanding performance over the past few years. This study introduces Intelligent Speech Recognition using the Fractal Amended Grasshopper Optimization Algorithm with Deep Learning (ISR-AGODL) approach. The presented ISR-AGODL technique correctly identifies and recognizes speech signals. In the ISR-AGODL technique, the speech signals are transformed into spectrograms. Besides, the features are derived using the deep convolutional neural networks (DCNN) model. Followed by the Fractals AGO technique is utilized for the choosing of hyperparameters. Finally, the recognition of speech signals is achieved using the extreme gradient boosting (XGBoost) model. The simulation outcomes of the ISR-AGODL method can be validated using a benchmark dataset. The experimental results of the ISR-AGODL method portrayed a superior accuracy outcome of 96.34% over other models.
引用
收藏
页数:12
相关论文
共 22 条
  • [1] Direct measurement of the branching fraction for D+→(K)over-bar0 μ+νμ and determination of Γ(D0→K-μ+ νμ)/Γ (D+→(K)over-bar0 μ+ νμ)
    Ablikim, M.
    Bai, J. Z.
    Ban, Y.
    Cai, X.
    Chen, H. F.
    Chen, H. S.
    Chen, H. X.
    Chen, J. C.
    Chen, Jin
    Chen, Y. B.
    Chu, Y. P.
    Dai, Y. S.
    Diao, L. Y.
    Deng, Z. Y.
    Dong, Q. F.
    Du, S. X.
    Fang, J.
    Fang, S. S.
    Fu, C. D.
    Gao, C. S.
    Gao, Y. N.
    Gu, S. D.
    Gu, Y. T.
    Guo, Y. N.
    He, K. L.
    He, M.
    Heng, Y. K.
    Hou, J.
    Hu, H. M.
    Hu, J. H.
    Hu, T.
    Huang, X. T.
    Ji, X. B.
    Jiang, X. S.
    Jiang, X. Y.
    Jiao, J. B.
    Jin, D. P.
    Jin, S.
    Lai, Y. F.
    Li, G.
    Li, H. B.
    Li, J.
    Li, R. Y.
    Li, S. M.
    Li, W. D.
    Li, W. G.
    Li, X. L.
    Li, X. N.
    Li, X. Q.
    Liang, Y. F.
    [J]. PHYSICS LETTERS B, 2007, 644 (01) : 20 - 24
  • [2] Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
    Aggarwal, Apeksha
    Srivastava, Akshat
    Agarwal, Ajay
    Chahal, Nidhi
    Singh, Dilbag
    Alnuaim, Abeer Ali
    Alhadlaq, Aseel
    Lee, Heung-No
    [J]. SENSORS, 2022, 22 (06)
  • [3] Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers
    Akcay, Mehmet Berkehan
    Oguz, Kaya
    [J]. SPEECH COMMUNICATION, 2020, 116 (116) : 56 - 76
  • [4] An X. D., 2021, J PHYS C SER, V1861, P1
  • [5] Speaker identification and localization using shuffled MFCC features and deep learning
    Barhoush M.
    Hallawa A.
    Schmeink A.
    [J]. International Journal of Speech Technology, 2023, 26 (01) : 185 - 196
  • [6] Basu Saikat, 2017, 2017 2nd International Conference on Communication and Electronics Systems (ICCES). Proceedings, P333, DOI 10.1109/CESYS.2017.8321292
  • [7] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [8] A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features
    Er, Mehmet Bilal
    [J]. IEEE ACCESS, 2020, 8 : 221640 - 221653
  • [9] ISNet: Individual Standardization Network for Speech Emotion Recognition
    Fan, Weiquan
    Xu, Xiangmin
    Cai, Bolun
    Xing, Xiaofen
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1803 - 1814
  • [10] Metaheuristic optimization based- ensemble learners for the carbonation assessment of recycled aggregate concrete
    Golafshani, Emadaldin Mohammadi
    Behnood, Ali
    Kim, Taehwan
    Ngo, Tuan
    Kashani, Alireza
    [J]. APPLIED SOFT COMPUTING, 2024, 159