Likelihood-Maximizing-Based Multiband Spectral Subtraction for Robust Speech Recognition

被引:0
|
作者
Bagher BabaAli
Hossein Sameti
Mehran Safayani
机构
[1] Sharif University of Technology,Department of Computer Engineering
关键词
Speech Recognition; Speech Signal; Recognition Accuracy; Automatic Speech Recognition; Speech Quality;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected between speech quality improvement and the increase in recognition accuracy. This paper proposes a novel approach for solving this problem by considering SS and the speech recognizer not as two independent entities cascaded together, but rather as two interconnected components of a single system, sharing the common goal of improved speech recognition accuracy. This will incorporate important information of the statistical models of the recognition engine as a feedback for tuning SS parameters. By using this architecture, we overcome the drawbacks of previously proposed methods and achieve better recognition accuracy. Experimental evaluations show that the proposed method can achieve significant improvement of recognition rates across a wide range of signal to noise ratios.
引用
收藏
相关论文
共 50 条
  • [41] Weight-Space Viterbi Decoding Based Spectral Subtraction for Reverberant Speech Recognition
    Ban, Sung Min
    Kim, Hyung Soon
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (09) : 1424 - 1428
  • [42] Target Speech GMM-based Spectral Compensation for Noise Robust Speech Recognition
    Shinozaki, Takahiro
    Furui, Sadaoki
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1223 - 1226
  • [43] Spectral estimation and normalisation for robust speech recognition
    Claes, T
    Xie, F
    VanCompernolle, D
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1997 - 2000
  • [44] SPECTRAL ESTIMATION FOR NOISE ROBUST SPEECH RECOGNITION
    ERELL, A
    WEINTRAUB, M
    SPEECH AND NATURAL LANGUAGE, 1989, : 319 - 324
  • [45] Nonlinear spectral transformations for robust speech recognition
    Ikbal, S
    Hermansky, H
    Bourlard, H
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 393 - 398
  • [46] Improved Noise Robust Automatic Speech Recognition System with Spectral Subtraction and Minimum Statistics Algorithm implemented in FPGA
    Orillo, John William
    Yap, Roderick
    Sybingco, Edwin
    TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY, 2012,
  • [47] Robust speech dereverberation using multichannel blind deconvolution with spectral subtraction
    Furuya, Ken'ichi
    Kataoka, Akitoshi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05): : 1579 - 1591
  • [48] Robust Speech Recognition Based on Dereverberation Parameter Optimization Using Acoustic Model Likelihood
    Gomez, Randy
    Kawahara, Tatsuya
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1708 - 1716
  • [49] Maximum likelihood subband polynomial regression for robust speech recognition
    Lu, Yong
    Wu, Zhenyang
    APPLIED ACOUSTICS, 2013, 74 (05) : 640 - 646
  • [50] A Variational Approach to Robust Maximum Likelihood Estimation for Speech Recognition
    Omar, Mohamed Kamal
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1049 - 1052