Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network

被引:0
作者
Baloch, Danish [1 ]
Abdullah, Sidrah [2 ]
Qaiser, Asma [3 ]
Ahmed, Saad [3 ]
Nasim, Faiza [2 ]
Kanwal, Mehreen [4 ]
机构
[1] DHA Suffa Univ, Dept Comp Sci, Karachi, Pakistan
[2] NED Univ Engn & Technol, Dept Comp Sci & Informat Technol, Karachi, Pakistan
[3] Iqra Univ, Dept Comp Sci, Karachi, Pakistan
[4] MS Fast Univ, Dept Comp Sci, Karachi, Pakistan
关键词
Speech enhancement; speech denoising; deep neural network; raw waveform; fully convolutional neural network; gated linear unit; NOISE;
D O I
10.14569/IJACSA.2023.0141184
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Enhancement aims to enhance audio intelligibility by reducing background noises that often degrade the quality and intelligibility of speech. This paper brings forward a deep learning approach for suppressing the background noise from the speaker's voice. Noise is a complex nonlinear function, so classical techniques such as Spectral Subtraction and Wiener filter approaches are not the best for non-stationary noise removal. The audio signal was processed in the raw audio waveform to incorporate an end-to-end speech enhancement approach. The proposed model's architecture is a 1-D Fully Convolutional Encoder-to-Decoder Gated Convolutional Neural Network (CNN). The model takes the simulated noisy signal and generates its clean representation. The proposed model is optimized on spectral and time domains. To minimize the error among time and spectral magnitudes, L1 loss is used. The model is generative, denoising English language speakers, and capable of denoising Urdu language speech when provided. In contrast, the model is trained exclusively on the English language. Experimental results show that it can generate a clean representation of a clean signal directly from a noisy signal when trained on samples of the Valentini dataset. On objective measures such as PESQ (Perceptual Evaluation of Speech Quality) and STOI (Short-Time Objective Intelligibility), the performance evaluation of the research outcome has been conducted. This system can be used with recorded videos and as a preprocessor for voice assistants like Alexa, and Siri, sending clear and clean instructions to the device.
引用
收藏
页码:831 / 836
页数:6
相关论文
共 29 条
[1]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[2]  
Dauphin YN, 2017, PR MACH LEARN RES, V70
[3]  
Defossez Alexandre., 2020, arXiv
[4]   A Hybrid Speech Enhancement Algorithm for Voice Assistance Application [J].
Gnanamanickam, Jenifa ;
Natarajan, Yuvaraj ;
Sri Preethaa, K. R. .
SENSORS, 2021, 21 (21)
[5]  
Kaur J., 2015, Int. J. of Advances in Sci. Engn. and Techn., P132
[6]  
Westhausen NL, 2020, Arxiv, DOI arXiv:2005.07551
[7]   ENHANCEMENT AND BANDWIDTH COMPRESSION OF NOISY SPEECH [J].
LIM, JS ;
OPPENHEIM, AV .
PROCEEDINGS OF THE IEEE, 1979, 67 (12) :1586-1604
[8]   Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks [J].
Liu, Chang-Le ;
Fu, Sze-Wei ;
Li, You-Jin ;
Huang, Jen-Wei ;
Wang, Hsin-Min ;
Tsao, Yu .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :1888-1900
[9]  
Liu Kuan-Yi, 2019, P 31 C COMP LING SPE, P226
[10]  
Lu XG, 2013, INTERSPEECH, P436