Speech Segregation in Background Noise Based on Deep Learning

被引:6
作者
Awotunde, Joseph Bamidele [1 ]
Ogundokun, Roseline Oluwaseun [2 ]
Ayo, Femi Emmanuel [3 ]
Matiluko, Opeyemi Emmanuel [4 ]
机构
[1] Univ Ilorin, Dept Comp Sci, Ilorin 240003, Nigeria
[2] Landmark Univ, Dept Comp Sci, Omu Aran 251101, Nigeria
[3] McPherson Univ, Dept Phys & Comp Sci, Seriki Sotayo 110001, Nigeria
[4] Landmark Univ, Ctr Syst & Informat Serv, Omu Aran 251101, Nigeria
关键词
Signal to noise ratio; Noise measurement; Machine learning; Acoustics; Time-frequency analysis; Speech enhancement; Speech segregation; deep learning; convolutional neural network; interference; background noise; INFORMATION;
D O I
10.1109/ACCESS.2020.3024077
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The most important way several people communicate is through speech. Speech is used to convey other information such as speaker communication, emotion, and attitude. Therefore, it is the most convenient and natural means of communication. The concept of speech segregation or processing involves sorting out wanted speech from noises in the background. Recently, a supervised learning approach was formulated for speech segregation problems. The latest trend in speech processing comprises the utilization of deep learning systems to increase the computational speed and performance of speech processing tasks. Hence, this study employed the use of a convolutional neural network to segregate speech in background noise. The convolutional neural network was used to explain the features of presenter auditory and consecutive subtleties. An unadapted speaker model was originally utilized to separate the two vocalizations gestures; they were then applied to the assessed signal-to-noise ratio (SNR) participation. The participation of SNR was thereafter applied to modify the speaker prototypes for re-estimating the speech signals that iterated twice before convergence. The developed method was tested on the TIMIT dataset. The results showed the strength of the developed method for speech segregation in background noise. Also, the findings of the study suggested that the method enhanced isolation performance and congregated reasonably fast. It was deduced that the system is simple and performs better in comparison to ultramodern speech processing methods in some input SNR conditions.
引用
收藏
页码:169568 / 169575
页数:8
相关论文
共 33 条
[1]   A Deep Convolutional Encoder-Decoder Architecture for Retinal Blood Vessels Segmentation [J].
Adeyinka, Adegun Adekanmi ;
Adebiyi, Marion Olubunmi ;
Akande, Noah Oluwatobi ;
Ogundokun, Roseline Oluwaseun ;
Kayode, Anthonia Aderonke ;
Oladele, Tinuke Omolewa .
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2019, PT V: 19TH INTERNATIONAL CONFERENCE, SAINT PETERSBURG, RUSSIA, JULY 14, 2019, PROCEEDINGS, PART V, 2019, 11623 :180-189
[2]  
[Anonymous], 2017, INT CONF ACOUST SPEE
[3]  
[Anonymous], 2016 13 INT C EMB
[4]  
[Anonymous], 2016, EAST CHINA NORM UNIV
[5]  
[Anonymous], 2016, ARXIV160702383
[6]  
[Anonymous], 2012, ADADELTA ADAPTIVE LE
[7]  
Baro H.M., 2017, THESIS
[8]   A Semi-supervised Speaker Identification Method for Audio Forensics Using Cochleagrams [J].
Camacho, Steven ;
Renza, Diego ;
Ballesteros L, Dora M. .
APPLIED COMPUTER SCIENCES IN ENGINEERING, 2017, 742 :55-64
[9]   Monoaural Audio Source Separation Using Deep Convolutional Neural Networks [J].
Chandna, Pritish ;
Miron, Marius ;
Janer, Jordi ;
Gomez, Emilia .
LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 :258-266
[10]   Activity analysis of construction equipment using audio signals and support vector machines [J].
Cheng, Chieh-Feng ;
Rashidi, Abbas ;
Davenport, Mark A. ;
Anderson, David V. .
AUTOMATION IN CONSTRUCTION, 2017, 81 :240-253