Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network

被引：0

作者：

Baloch, Danish ^{[1
]}

Abdullah, Sidrah ^{[2
]}

Qaiser, Asma ^{[3
]}

Ahmed, Saad ^{[3
]}

Nasim, Faiza ^{[2
]}

Kanwal, Mehreen ^{[4
]}

机构：

[1] DHA Suffa Univ, Dept Comp Sci, Karachi, Pakistan

[2] NED Univ Engn & Technol, Dept Comp Sci & Informat Technol, Karachi, Pakistan

[3] Iqra Univ, Dept Comp Sci, Karachi, Pakistan

[4] MS Fast Univ, Dept Comp Sci, Karachi, Pakistan

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 11期

关键词：

Speech enhancement; speech denoising; deep neural network; raw waveform; fully convolutional neural network; gated linear unit; NOISE;

D O I：

10.14569/IJACSA.2023.0141184

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Enhancement aims to enhance audio intelligibility by reducing background noises that often degrade the quality and intelligibility of speech. This paper brings forward a deep learning approach for suppressing the background noise from the speaker's voice. Noise is a complex nonlinear function, so classical techniques such as Spectral Subtraction and Wiener filter approaches are not the best for non-stationary noise removal. The audio signal was processed in the raw audio waveform to incorporate an end-to-end speech enhancement approach. The proposed model's architecture is a 1-D Fully Convolutional Encoder-to-Decoder Gated Convolutional Neural Network (CNN). The model takes the simulated noisy signal and generates its clean representation. The proposed model is optimized on spectral and time domains. To minimize the error among time and spectral magnitudes, L1 loss is used. The model is generative, denoising English language speakers, and capable of denoising Urdu language speech when provided. In contrast, the model is trained exclusively on the English language. Experimental results show that it can generate a clean representation of a clean signal directly from a noisy signal when trained on samples of the Valentini dataset. On objective measures such as PESQ (Perceptual Evaluation of Speech Quality) and STOI (Short-Time Objective Intelligibility), the performance evaluation of the research outcome has been conducted. This system can be used with recorded videos and as a preprocessor for voice assistants like Alexa, and Siri, sending clear and clean instructions to the device.

引用

页码：831 / 836

页数：6

共 29 条

[1] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].