Signal subspace method with continuous noise estimation and its efficiency in robust automatic speech recognition

被引：0

作者：

Jarc, Bojan ^{[1
]}

Babic, Rudolf ^{[1
]}

机构：

[1] Univ Maribor, Fak Elektrotehniko Racunalnistvo Informatiko, Smetanova Ul 17, Maribor 2000, Slovenia

来源：

ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW | 2007年 / 74卷 / 04期

关键词：

speech recognition in a noisy environment; signal processing; signal subspace; voice-activity detection;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In most of automatic speech-recognition (ASR) systems, recognition performance significantly decreases when moving from the studio to a real-world environment. A noisy environment and echo are the most common reasons for ASR performance degradation. New trends in the area of mobile communications demand development of efficient recognition and pre-processing methods in order to improve noise robustness. This paper presents a signal subspace-based method for noise reduction and its efficiency for ASR improvement in a real noisy environment. The signal subspace method was first presented in [1] presuming the white noise as an interfering signal. According to [1], the clean signal is estimated by using noisy-signal covariance matrix eigenvalues. Since the calculation of eigenvalues with the Karhunen-Loeve transformation (KLT) is a computationally intensive task, they can be approximated with the use of fast discrete cosine transformation (FDCT) [4]. They are called approximate eigenvalues. To achieve the method suitability for real-world environments, we propose a minima tracking-based approach for noise covariance matrix eigenvalues estimation. Since it is presumed that the noise and speech are uncorrelated zero mean signals, covariance coefficients can be estimated with autocorrelation coefficients and additive relation given in (13). According to (14), the additive relation is also preserved between speech and noise approximate eigenvalues (lambda) over cap (s) and (lambda) over cap (d). This is the basis for the use of the minima tracking-based approach for estimation of (lambda) over bar (d) (see Eq. 15 and Fig. 1). To reduce overestimation of (lambda) over bar (d) in areas of speech presence, we propose a signal-to-noise ratio (SNR)-dependent estimation of (lambda) over bar (s) by (17). For clean speech estimation, a spectral domain-constraint estimator (SDC) is used by (10). The SDC estimator wrongly presumes that the speech is always present in a noisy signal. Since speech is a correlated signal, we propose a voice-activity detection (VAD) method based upon the level of autocorrelation (see Eq. 18-19). Presumption that the noise is a more weakly correlated signal than speech allows us to use the minima tracking-based approach for determination of the noise-correlation level (see Fig. 3 b). A novel VAD function based on the ratio of speech and noise correlation levels is defined by (19). The clean speech is then estimated in the time domain using an SDC estimator and VAD gain function by (20). The proposed method efficiency is confirmed with ASR results in Aurora 2 and 3 experimental frameworks comprising the noisy speech of connected digits with train and test schemes for ASR. The mel-cepstrum feature extraction algorithm is applied with 12 mel cepstrum coefficients and the energy coefficient. The absolute recognition performance for the Aurora 2 and 3 ASR tasks are shown in Tables 1 and 2, respectively. The best overall word-recognition accuracy of 83.90% and 78.29% respectively are achieved. Relatively to the baseline results, this stands for a 35.49 % and 10.86% improvement.

引用

页码：229 / 235

页数：7

共 50 条

[1] Signal subspace method with continuous noise estimation and its efficiency in robust automatic speech recognition
Jare, Bojan
Babič, Rudolf
Elektrotehniski Vestnik/Electrotechnical Review, 2007, 74 (04): : 229 - 235
[2] A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition
Kris Hermus
Patrick Wambacq
Hugo Van hamme
EURASIP Journal on Advances in Signal Processing, 2007
[3] A review of signal subspace speech enhancement and its application to noise robust speech recognition
Hermus, Kris
Wambacq, Patrick
Van hamme, Hugo
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
[4] Assessment of signal subspace based speech enhancement for noise robust speech recognition
Hermus, K
Wambacq, P
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 945 - 948
[5] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
Wireless Personal Communications, 2023, 130 : 2039 - 2058
[6] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
Mahadevaswamy
WIRELESS PERSONAL COMMUNICATIONS, 2023, 130 (03) : 2039 - 2058
[7] Channel identification and signal spectrum estimation for robust automatic speech recognition
Zhao, YX
IEEE SIGNAL PROCESSING LETTERS, 1998, 5 (12) : 305 - 308
[8] Noise Robust Speech Features for Automatic Continuous Speech Recognition using Running Spectrum Analysis
Ohnuki, Kazunaga
Takahashi, Wataru
Yoshizawa, Shingo
Miyanaga, Yoshikazu
2008 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, 2008, : 150 - 153
[9] An Application Specific Matrix Processor for Signal subspace based speech enhancement in noise robust speech recognition applications
Natarajan, Karthikeyan
Arun, S.
Murugaraj, K.
John, Mala
ASICON 2007: 2007 7TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS, 2007, : 766 - 769
[10] An energy-constrained signal subspace method for speech enhancement and recognition in colored noise
Huang, J
Zhao, YX
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 377 - 380

← 1 2 3 4 5 →