Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments

被引:9
作者
Soleymanpour, Rahim [1 ,2 ]
Soleymanpour, Mohammad [3 ]
Brammer, Anthony J. [1 ]
Johnson, Michael T. [3 ]
Kim, Insoo [1 ,2 ]
机构
[1] Univ Connecticut, Dept Med, Sch Med, Farmington, CT 06030 USA
[2] Univ Connecticut, Dept Biomed Engn, Storrs, CT 06269 USA
[3] Univ Kentucky, Dept Elect & Comp Engn, Lexington, KY 40506 USA
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Speech enhancement; Noise measurement; Signal processing algorithms; Convolutional neural networks; Psychoacoustic models; Time-frequency analysis; temporal envelope (TEV); convolution neural network (CNN); HEARING-IMPAIRED LISTENERS; INTELLIGIBILITY; FREQUENCY; REDUCTION;
D O I
10.1109/ACCESS.2023.3236242
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal modulation processing is a promising technique for improving the intelligibility and quality of speech in noise. We propose a speech enhancement algorithm that constructs the temporal envelope (TEV) in the time-frequency domain by means of an embedded convolutional neural network (CNN). To accomplish this, the input speech signals are divided into sixteen parallel frequency bands (subbands) with bandwidths approximating 1.5 times that of auditory filters. The corrupted TEVs in each subband are extracted and then fed to the 1-dimensional CNN (1-D CNN) model to restore the TEVs distorted by noise. The method is evaluated using 2,700 words from nine different talkers, which are mixed with speech-spectrum shaped random noise (SSN), and babble noise, at different signal-to-noise ratios. The Short-time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) metrics are used to evaluate the performance of the 1-D CNN algorithm. Results suggest that the 1-D CNN model improves STOI scores on average by 27% and 34% for SSN and babble noise, respectively, and PESQ scores on average by 19% and 18%, respectively, compared to unprocessed speech. The 1-D CNN model is also shown to outperform a conventional TEV-based speech enhancement algorithm.
引用
收藏
页码:5328 / 5336
页数:9
相关论文
共 54 条
[1]   1-D CNNs for structural damage detection: Verification on a structural health monitoring benchmark data [J].
Abdeljaber, Osama ;
Avci, Onur ;
Kiranyaz, Mustafa Serkan ;
Boashash, Boualem ;
Sodano, Henry ;
Inman, Daniel J. .
NEUROCOMPUTING, 2018, 275 :1308-1317
[2]   Joint acoustic and modulation frequency [J].
Atlas, L ;
Shamma, SA .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (07) :668-675
[3]   MODULATION MASKING - EFFECTS OF MODULATION FREQUENCY, DEPTH, AND PHASE [J].
BACON, SP ;
GRANTHAM, DW .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1989, 85 (06) :2575-2580
[4]   Speech Enhancement via Attention Masking Network (SEAMNET): An End-to-End System for Joint Suppression of Noise and Reverberation [J].
Borgstrom, Bengt J. ;
Brandstein, Michael S. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :515-526
[5]   Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex [J].
Depireux, DA ;
Simon, JZ ;
Klein, DJ ;
Shamma, SA .
JOURNAL OF NEUROPHYSIOLOGY, 2001, 85 (03) :1220-1234
[6]   EFFECT OF REDUCING SLOW TEMPORAL MODULATIONS ON SPEECH RECEPTION [J].
DRULLMAN, R ;
FESTEN, JM ;
PLOMP, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05) :2670-2680
[7]   EFFECT OF TEMPORAL ENVELOPE SMEARING ON SPEECH RECEPTION [J].
DRULLMAN, R ;
FESTEN, JM ;
PLOMP, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (02) :1053-1064
[8]  
Edwards Brent, 2004, VVolume 18, P339
[9]   The Modulation Transfer Function for Speech Intelligibility [J].
Elliott, Taffeta M. ;
Theunissen, Frederic E. .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (03)
[10]   End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks [J].
Fu, Szu-Wei ;
Wang, Tao-Wei ;
Tsao, Yu ;
Lu, Xugang ;
Kawai, Hisashi .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) :1570-1584