Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement

被引:0
作者
Joyner Cadore
Francisco J. Valverde-Albacete
Ascensión Gallardo-Antolín
Carmen Peláez-Moreno
机构
[1] Universidad Carlos III de Madrid,
来源
Cognitive Computation | 2013年 / 5卷
关键词
Spectral subtraction; Spectrogram; Morphological processing; Image filtering; Automatic speech recognition; Speech enhancement; Auditory-based features;
D O I
暂无
中图分类号
学科分类号
摘要
New auditory-inspired speech processing methods are presented in this paper, combining spectral subtraction and two-dimensional non-linear filtering techniques originally conceived for image processing purposes. In particular, mathematical morphology operations, like erosion and dilation, are applied to noisy speech spectrograms using specifically designed structuring elements inspired in the masking properties of the human auditory system. This is effectively complemented with a pre-processing stage including the conventional spectral subtraction procedure and auditory filterbanks. These methods were tested in both speech enhancement and automatic speech recognition tasks. For the first, time-frequency anisotropic structuring elements over grey-scale spectrograms were found to provide a better perceptual quality than isotropic ones, revealing themselves as more appropriate—under a number of perceptual quality estimation measures and several signal-to-noise ratios on the Aurora database—for retaining the structure of speech while removing background noise. For the second, the combination of Spectral Subtraction and auditory-inspired Morphological Filtering was found to improve recognition rates in a noise-contaminated version of the Isolet database.
引用
收藏
页码:426 / 441
页数:15
相关论文
共 71 条
[1]  
Baker J(1975)The Dragon system—an overview IEEE Trans Acoust Speech Signal Process. 23 24-29
[2]  
Beerends J(2002)Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment. Part II: psychoacoustic model J Audio Eng Soc. 50 765-778
[3]  
Hekstra A(1979)Enhancement of speech corrupted by acoustic noise IEEE Int Conf Acoust Speech Signal Process 4 208-211
[4]  
Rix A(1980)Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Trans Acoust Speech Signal Process 28 357-366
[5]  
Hollier M(1984)Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator IEEE Trans Acoust Speech Signal Process. 32 1109-1121
[6]  
Berouti M(1988)Temporal integration in normal hearing, cochlear impairment, and impairment simulated by masking J Acoust Soc Am 84 195-203
[7]  
Schwartz R(1990)Derivation of auditory filter shapes from notched-noise data Hear Res 47 103-138
[8]  
Makhoul J(2010)Perceptual speech enhancement exploiting temporal masking properties of human auditory system Speech Commun. 52 381-393
[9]  
Davis S(2008)Evaluation of objective quality measures for speech enhancement IEEE Trans Audio Speech Lang Process 16 229-238
[10]  
Mermelstein P(1997)A time-domain, level-dependent auditory filter: The gammachirp J Acoust Soc Am 101 412-419