Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement

被引：9

作者：

Cadore, Joyner ^{[1
]}

Valverde-Albacete, Francisco J. ^{[1
]}

Gallardo-Antolin, Ascension ^{[1
]}

Pelaez-Moreno, Carmen ^{[1
]}

机构：

[1] Univ Carlos III Madrid, Madrid 28911, Spain

来源：

COGNITIVE COMPUTATION | 2013年 / 5卷 / 04期

关键词：

Spectral subtraction; Spectrogram; Morphological processing; Image filtering; Automatic speech recognition; Speech enhancement; Auditory-based features; PERCEPTUAL EVALUATION; NEURAL TRANSDUCTION; QUALITY ASSESSMENT; ITU STANDARD; SIMULATION; FREQUENCY; MASKING; PESQ;

D O I：

10.1007/s12559-012-9196-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

New auditory-inspired speech processing methods are presented in this paper, combining spectral subtraction and two-dimensional non-linear filtering techniques originally conceived for image processing purposes. In particular, mathematical morphology operations, like erosion and dilation, are applied to noisy speech spectrograms using specifically designed structuring elements inspired in the masking properties of the human auditory system. This is effectively complemented with a pre-processing stage including the conventional spectral subtraction procedure and auditory filterbanks. These methods were tested in both speech enhancement and automatic speech recognition tasks. For the first, time-frequency anisotropic structuring elements over grey-scale spectrograms were found to provide a better perceptual quality than isotropic ones, revealing themselves as more appropriate-under a number of perceptual quality estimation measures and several signal-to-noise ratios on the Aurora database-for retaining the structure of speech while removing background noise. For the second, the combination of Spectral Subtraction and auditory-inspired Morphological Filtering was found to improve recognition rates in a noise-contaminated version of the Isolet database.

引用

页码：426 / 441

页数：16

共 53 条

[1]

[Anonymous], SIGNAL PROCESSING

[2]

[Anonymous], 1988, Objective measures of speech quality

[3]

[Anonymous], 2000, ASR2000 AUTOMATIC SP

[4]

[Anonymous], PRINCIPLES PRACTICE

[5] DRAGON SYSTEM - OVERVIEW [J].

BAKER, JK .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :24-29

[6]

Beerends JG, 2002, J AUDIO ENG SOC, V50, P765

[7]

Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208

[8]

Bourlard H, 1998, LECT NOTES ARTIF INT, V1387, P389, DOI 10.1007/BFb0054006

[9]

Cole R, 2011, ISOLET SPOKEN LETT D

[10] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

← 1 2 3 4 5 6 →