Signal subspace method with continuous noise estimation and its efficiency in robust automatic speech recognition

被引:0
|
作者
Jarc, Bojan [1 ]
Babic, Rudolf [1 ]
机构
[1] Univ Maribor, Fak Elektrotehniko Racunalnistvo Informatiko, Smetanova Ul 17, Maribor 2000, Slovenia
来源
ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW | 2007年 / 74卷 / 04期
关键词
speech recognition in a noisy environment; signal processing; signal subspace; voice-activity detection;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In most of automatic speech-recognition (ASR) systems, recognition performance significantly decreases when moving from the studio to a real-world environment. A noisy environment and echo are the most common reasons for ASR performance degradation. New trends in the area of mobile communications demand development of efficient recognition and pre-processing methods in order to improve noise robustness. This paper presents a signal subspace-based method for noise reduction and its efficiency for ASR improvement in a real noisy environment. The signal subspace method was first presented in [1] presuming the white noise as an interfering signal. According to [1], the clean signal is estimated by using noisy-signal covariance matrix eigenvalues. Since the calculation of eigenvalues with the Karhunen-Loeve transformation (KLT) is a computationally intensive task, they can be approximated with the use of fast discrete cosine transformation (FDCT) [4]. They are called approximate eigenvalues. To achieve the method suitability for real-world environments, we propose a minima tracking-based approach for noise covariance matrix eigenvalues estimation. Since it is presumed that the noise and speech are uncorrelated zero mean signals, covariance coefficients can be estimated with autocorrelation coefficients and additive relation given in (13). According to (14), the additive relation is also preserved between speech and noise approximate eigenvalues (lambda) over cap (s) and (lambda) over cap (d). This is the basis for the use of the minima tracking-based approach for estimation of (lambda) over bar (d) (see Eq. 15 and Fig. 1). To reduce overestimation of (lambda) over bar (d) in areas of speech presence, we propose a signal-to-noise ratio (SNR)-dependent estimation of (lambda) over bar (s) by (17). For clean speech estimation, a spectral domain-constraint estimator (SDC) is used by (10). The SDC estimator wrongly presumes that the speech is always present in a noisy signal. Since speech is a correlated signal, we propose a voice-activity detection (VAD) method based upon the level of autocorrelation (see Eq. 18-19). Presumption that the noise is a more weakly correlated signal than speech allows us to use the minima tracking-based approach for determination of the noise-correlation level (see Fig. 3 b). A novel VAD function based on the ratio of speech and noise correlation levels is defined by (19). The clean speech is then estimated in the time domain using an SDC estimator and VAD gain function by (20). The proposed method efficiency is confirmed with ASR results in Aurora 2 and 3 experimental frameworks comprising the noisy speech of connected digits with train and test schemes for ASR. The mel-cepstrum feature extraction algorithm is applied with 12 mel cepstrum coefficients and the energy coefficient. The absolute recognition performance for the Aurora 2 and 3 ASR tasks are shown in Tables 1 and 2, respectively. The best overall word-recognition accuracy of 83.90% and 78.29% respectively are achieved. Relatively to the baseline results, this stands for a 35.49 % and 10.86% improvement.
引用
收藏
页码:229 / 235
页数:7
相关论文
共 50 条
  • [1] Signal subspace method with continuous noise estimation and its efficiency in robust automatic speech recognition
    Jare, Bojan
    Babič, Rudolf
    Elektrotehniski Vestnik/Electrotechnical Review, 2007, 74 (04): : 229 - 235
  • [2] A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition
    Kris Hermus
    Patrick Wambacq
    Hugo Van hamme
    EURASIP Journal on Advances in Signal Processing, 2007
  • [3] A review of signal subspace speech enhancement and its application to noise robust speech recognition
    Hermus, Kris
    Wambacq, Patrick
    Van hamme, Hugo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
  • [4] Assessment of signal subspace based speech enhancement for noise robust speech recognition
    Hermus, K
    Wambacq, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 945 - 948
  • [6] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
    Mahadevaswamy
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 130 (03) : 2039 - 2058
  • [7] Channel identification and signal spectrum estimation for robust automatic speech recognition
    Zhao, YX
    IEEE SIGNAL PROCESSING LETTERS, 1998, 5 (12) : 305 - 308
  • [8] Noise Robust Speech Features for Automatic Continuous Speech Recognition using Running Spectrum Analysis
    Ohnuki, Kazunaga
    Takahashi, Wataru
    Yoshizawa, Shingo
    Miyanaga, Yoshikazu
    2008 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, 2008, : 150 - 153
  • [9] An Application Specific Matrix Processor for Signal subspace based speech enhancement in noise robust speech recognition applications
    Natarajan, Karthikeyan
    Arun, S.
    Murugaraj, K.
    John, Mala
    ASICON 2007: 2007 7TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS, 2007, : 766 - 769
  • [10] An energy-constrained signal subspace method for speech enhancement and recognition in colored noise
    Huang, J
    Zhao, YX
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 377 - 380