Stabilised weighted linear prediction

被引：75

作者：

Magi, Carlo ^{[1
]}

Pohjalainen, Jouni ^{[1
]}

Backstrom, Tom ^{[1
]}

Alku, Paavo ^{[1
]}

机构：

[1] Aalto Univ, Lab Acoust & Audio Signal Proc, FI-02015 Helsinki, Finland

来源：

SPEECH COMMUNICATION | 2009年 / 51卷 / 05期

基金：

芬兰科学院;

关键词：

Linear prediction; All-pole modelling; Spectral estimation; SPEECH; RECOGNITION; EXTRACTION; SPECTRUM;

D O I：

10.1016/j.specom.2008.12.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Weighted linear prediction (WLP) is a method to compute all-pole models of speech by applying temporal weighting of the square of the residual signal. By using short-time energy (STE) as a weighting function, this algorithm was originally proposed as an improved linear predictive (LP) method based oil emphasising those samples that fit the underlying speech production model well. The original formulation of WLP, however, did not guarantee stability of all-pole models. Therefore, the current work revisits the concept of WLP by introducing a modified short-time energy function leading always to stable all-pole models. This new method, stabilised weighted linear prediction (SWLP), is shown to yield all-pole models whose general performance can be adjusted by properly choosing the length of the STE window, a parameter denoted by M. The study compares the performances of SWLP, minimum variance distortionless response (MVDR), and conventional LP in spectral modelling of speech corrupted by additive noise. The comparisons were performed by computing, for each method, the logarithmic spectral differences between the all-pole spectra extracted from clean and noisy speech in different segmental signal-to-noise ratio (SNR) categories. The results showed that the proposed SWLP algorithm was the most robust method against zero-mean Gaussian noise and the robustness was largest for SWLP with a small M-value. These findings were corroborated by a small listening test in which the majority of the listeners assessed the quality of impulse-train-excited SWLP filters, extracted from noisy speech, to be perceptually closer to original clean speech than the corresponding all-pole responses computed by MVDR. Finally, SWLP was compared to other short-time spectral estimation methods (FFT, LP, MVDR) in isolated word recognition experiments. Recognition accuracy obtained by SWLP, in comparison to other short-time spectral estimation methods, improved already at moderate segmental SNR values for sounds corrupted by zero-mean Gaussian noise. For realistic factory noise of low pass characteristics, the SWLP method improved file recognition results at segmental SNR levels below 0 dB. (C) 2009 Published by Elsevier B.V.

引用

页码：401 / 411

页数：11

共 33 条

[1] GLOTTAL WAVE ANALYSIS WITH PITCH SYNCHRONOUS ITERATIVE ADAPTIVE INVERSE FILTERING [J].

ALKU, P .

SPEECH COMMUNICATION, 1992, 11 (2-3) :109-118

[2]

BACKSTROM T, 2004, THESIS HELSINKI U TE

[3]

Bazaraa M. S., 1993, Nonlinear programming theory and algorithms

[4] MEASURING AND MODELING VOCAL SOURCE-TRACT INTERACTION [J].

CHILDERS, DG ;

WONG, CF .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1994, 41 (07) :663-671

[5] STABILITY OF LINEAR PREDICTORS AND NUMERICAL RANGE OF A LINEAR OPERATOR [J].

DELSARTE, P ;

GENIN, Y ;

KAMP, Y .

IEEE TRANSACTIONS ON INFORMATION THEORY, 1987, 33 (03) :412-415

[6] A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds [J].

Deng, HQ ;

Ward, RK ;

Beddoes, MP ;

Hodgson, M .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :445-455

[7]

DEWET F, 2001, P EUR 2001 AALB DENM

[8] Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method [J].

Dharanipragada, Satya ;

Yapanel, Umit H. ;

Rao, Bhaskar D. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01) :224-234

[9] DISCRETE ALL-POLE MODELING [J].

ELJAROUDI, A ;

MAKHOUL, J .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (02) :411-423

[10]

Garofolo JS, 1993, TIMIT acoustic-phonetic continuous speech corpus, DOI DOI 10.35111/17GK-BN40

← 1 2 3 4 →