Capturing local variability for speaker normalization in speech recognition

被引：6

作者：

Miguel, Antonio ^{[1
]}

Lleida, Eduardo ^{[1
]}

Rose, Richard ^{[2
]}

Buera, Luis ^{[1
]}

Saz, Oscar ^{[1
]}

Ortega, Alfonso ^{[1
]}

机构：

[1] Univ Zaragoza, Dept Elect Engn & Commun, E-50009 Zaragoza, Spain

[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2T5, Canada

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 03期

基金：

加拿大自然科学与工程研究理事会;

关键词：

automatic speech recognition (ASR); local warping; maximum likelihood; speaker normalization; vocal tract normalization;

D O I：

10.1109/TASL.2007.914114

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The new model reduces the impact of local spectral and temporal variability by estimating a finite set of spectral and temporal warping factors which are applied to speech at the frame level. Optimum warping factors are obtained while decoding in a locally constrained search. The model involves augmenting the states of a standard hidden Markov model (HMM), providing an additional degree of freedom. It is argued in this paper that this represents an efficient and effective method for compensating local variability in speech which may have potential application to a broader array of speech transformations. The technique is presented in the context of existing methods for frequency warping-based speaker normalization for ASR. The new model is evaluated in clean and noisy task domains using subsets of the Aurora 2, the Spanish Speech-Dat-Car, and the TIDIGITS corpora. In addition, some experiments are performed on a Spanish language corpus collected from a population of speakers with a range of speech disorders. It has been found that, under clean or not severely degraded conditions, the new model provides improvements over the standard HMM baseline. It is argued that the framework of local warping is an effective general approach to providing more flexible models of speaker variability.

引用

页码：578 / 593

页数：16

共 40 条

[1] ANDREOU A, 1994, P CAIP WORKSH FRONT, V2
[2] CROOT K, 1999, P INT C SPOK LANG PR, P907
[3] DIFFERENTIAL DIAGNOSTIC PATTERNS OF DYSARTHRIA
DARLEY, FL
ARONSON, AE
BROWN, JR
[J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1969, 12 (02): : 246 - &
[4] ON THE USE OF HIDDEN MARKOV MODELING FOR RECOGNITION OF DYSARTHRIC SPEECH
DELLER, JR
HSU, D
FERRIER, LJ
[J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 1991, 35 (02) : 125 - 139
[5] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[6] *ETSI, 2000, 201108V112 ETSI ES
[7] ETSI, 2002, ETSI ES 202 050 v1.1.1
[8] FUKADA T, 1998, P IEEE INT C AC SPEE, V11, P437
[9] SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM
FURUI, S
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01): : 52 - 59
[10] Maximum likelihood linear transformations for HMM-based speech recognition
Gales, MJF
[J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) : 75 - 98

← 1 2 3 4 →