An optimized convolutional neural network for speech enhancement

被引:0
作者
Karthik A. [1 ,2 ]
Mazher Iqbal J.L. [1 ]
机构
[1] Department of ECE, Veltech Rangarajan Dr Sagunthala R&D Institute of Science and Technology, Chennai
[2] Department of ECE, Institute of Aeronautical Engineering, Hyderabad
关键词
Character error rate; Convolutional neural network; Minimization; Optimization; Recognition; Speech enhancement;
D O I
10.1007/s10772-023-10073-6
中图分类号
学科分类号
摘要
Speech enhancement is an important property in today’s world because most applications use voice recognition as an important feature for performing operations in it. Perfect recognition of commands is achieved only by recognizing the voice correctly. Hence, the speech signal must be enhanced and free from background noise for the recognition process. In the existing approach, a recurrent convolutional encoder/decoder is used for denoising the speech signal. It utilized the signal-to-noise ratio property for enhancing the speech signal. It removes the noise signal effectively by having a low character error rate. But it does not describe the range of SNR of the noise added to the signal. Hence, in this, optimized deep learning is proposed to enhance the speech signal. AI function deep learning mimics the human brain's ability to analyze data and create patterns for use in making decisions. An optimized convolutional neural network was proposed for enhancing the speech for a different type of signal-to-noise ratio value of noises. Here, the particle swarm optimization process performs tuning the hyper-parameters of the convolutional neural network. The tuning of value is to minimize the character error rate of the signal. The proposed method is realized using MATLAB R2020b software and evaluation takes place by calculating the character error rate, PESQ, and STOI of the signal. Then, the comparison of the proposed and existing method takes place using evaluation metrics with − 5 dB, 0 dB, + 5 dB and + 10 dB. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:1117 / 1129
页数:12
相关论文
共 34 条
  • [1] Abdulbaqi J., Gu Y., Chen S., Marsic I., Residual recurrent neural network for speech enhancement, In ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6659-6663, (2020)
  • [2] Abdulbaqi J., Gu Y., Marsic I., RHR-Net: A residual hourglass recurrent neural network for speech enhancement, Arxiv, (2019)
  • [3] Bahadur I., Kumar S., Agarwal P., Performance measurement of a hybrid speech enhancement technique, International Journal of Speech Technology, 24, pp. 665-677, (2021)
  • [4] Bhat G.S., Shankar N., Reddy C.K.A., Panahi I.M.S., A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone, IEEE Access, 7, pp. 78421-78433, (2019)
  • [5] Borgstrom B.J., Brandstein M.S., The speech enhancement via attention masking network (SEAMNET): An end-to-end system for joint suppression of noise and reverberation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, pp. 515-526, (2020)
  • [6] Chai L., Du J., Liu Q., Lee C., A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, pp. 106-117, (2021)
  • [7] Fu S.-W., Wang T.-W., Tsao Y., Lu X., Kawai H., End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 9, pp. 1570-1584, (2018)
  • [8] Gnanamanickam J., Natarajan Y., SriPreethaa K.R., A hybrid speech enhancement algorithm for voice assistance application, Sensors (basel, Switzerland), 21, 21, (2021)
  • [9] ). Using recurrences in time and frequency within U-Net architecture for speech enhancement, . in ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2019)
  • [10] Gutierrez-Munoz M., Coto-Jimenez M., An experimental study on speech enhancement based on a combination of wavelets and deep learning, Computation, 10, (2022)