Morphological filtering of spectrograms for automatic speech recognition

被引:0
作者
Liu, WM [1 ]
Bastante, VJR [1 ]
Rodriguez, FR [1 ]
Evans, NWD [1 ]
Mason, JSD [1 ]
机构
[1] Univ Coll Swansea, Sch Engn, Swansea, W Glam, Wales
来源
Proceedings of the Fourth IASTED International Conference on Visualization, Imaging, and Image Processing | 2004年
关键词
ASR (automatic speech recognition); segmentation; morphological filtering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper examines the separation of speech signals from additive noise using a recently proposed signal, noise segmentation approach based on statistical properties of the spectrogram [1,2]. Competitive ASR results were reported in [3] despite using only crude spectrogram shape information suggesting that the approach offers high reliability in identifying regions of different signal dominance and might be robust down to negative SNRs. This paper extends these early results in two directions. First extension investigates the contribution of spectrogram shapes plus magnitudes versus shapes alone, the same ASR experiments as in [3] are repeated but this time with magnitude information recovered in regions deemed to contain speech. Results show consistent improvement for all SNRs down to -5dB. Second extension relates to computational efficiency, a modified one-pass version of the originally iterative process is proposed by deducing empirically an optimal final stopping condition for each SNR. This is found to reduce computational time significantly (factors ranging from 7 to 18) whilst improving ASR accuracy.
引用
收藏
页码:546 / 549
页数:4
相关论文
共 7 条
  • [1] [Anonymous], P EURSIPCO ED UK
  • [2] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [3] HIRSCH HG, 2000, ISCA ITRW ARS2000 AU
  • [4] Spectrogram, segmentation by means of statistical features for non-stationary signal interpretation
    Hory, C
    Martin, N
    Chehikian, A
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (12) : 2915 - 2925
  • [5] HORY C, 2002, P EUSPICO, P427
  • [6] LEONARD RG, 1984, P ICASSP 84, V3, P11
  • [7] RODRIGUEZ FR, 2003, P EUR