A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments

被引:27
作者
Visser, E
Otsuka, M
Lee, TW
机构
[1] Univ Calif San Diego, Inst Neural Computat, Dept 0523, La Jolla, CA 92093 USA
[2] DENSO Corp, Res Labs, Aichi 4700111, Japan
关键词
speech enhancement; robust speech recognition; blind source separation; noisy environments;
D O I
10.1016/S0167-6393(03)00010-4
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial processing stage. Then denoising of distributed background noise is achieved in a combined spatial/temporal processing approach. The desired speaker signal is first processed along with an artificially constructed noise signal in a supplementary blind source separation step. It is further denoised by exploiting differences in temporal speech and noise statistics in a wavelet filterbank. The scheme's performance is illustrated by speech recognition experiments on real recordings in a noisy car environment. In comparison to a common multi-microphone technique like beamforming with spectral subtraction, the scheme is shown to enable more accurate speech recognition in the presence of a highly interfering point source and strong background noise. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:393 / 407
页数:15
相关论文
共 42 条
[1]  
ADAMI A, 2002, P ICSLP, P21
[2]  
[Anonymous], P INTERSPEECH
[3]  
[Anonymous], P IEEE INT C AC SPEE
[4]   EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312
[5]  
ATTIAS H, 2001, ADV NEURAL INFORMATI, V13
[6]   AN INFORMATION MAXIMIZATION APPROACH TO BLIND SEPARATION AND BLIND DECONVOLUTION [J].
BELL, AJ ;
SEJNOWSKI, TJ .
NEURAL COMPUTATION, 1995, 7 (06) :1129-1159
[7]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[8]   A practical methodology for speech source localization with microphone arrays [J].
Brandstein, MS ;
Silverman, HF .
COMPUTER SPEECH AND LANGUAGE, 1997, 11 (02) :91-126
[9]  
BUCCIGROSSI RW, 1997, P IEEE INT C AC SPEE
[10]   BLIND BEAMFORMING FOR NON-GAUSSIAN SIGNALS [J].
CARDOSO, JF ;
SOULOUMIAC, A .
IEE PROCEEDINGS-F RADAR AND SIGNAL PROCESSING, 1993, 140 (06) :362-370