Improved automatic speech recognition through speaker normalization

被引:33
作者
Giuliani, D
Gerosa, M
Brugnara, F
机构
[1] ITC Irst, Ctr Ric Sci & Tecnol, I-38050 Trento, Italy
[2] Univ Trent, Int Grad Sch, I-38050 Trento, Italy
关键词
D O I
10.1016/j.csl.2005.05.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, speaker adaptive acoustic modeling is investigated by using a novel method for speaker normalization and a well known vocal tract length normalization method. With the novel normalization method, acoustic observations of training and testing speakers are mapped into a normalized acoustic space through speaker-specific transformations with the aim of reducing inter-speaker acoustic variability. For each speaker, an affine transformation is estimated with the goal of reducing the mismatch between the acoustic data of the speaker and a set of target hidden Markov models.. This transformation is estimated through constrained maximum likelihood linear regression and then applied to map the acoustic observations of the speaker into the normalized acoustic space. Recognition experiments made use of two corpora, the first one consisting of adults' speech, the second one consisting of children's speech. Performing training and recognition with normalized data resulted in a consistent reduction of the word error rate with respect to the baseline systems trained on unnormalized data. In addition, the novel method always performed better than the reference vocal tract length normalization method adopted in this work. When unsupervised static speaker adaptation was applied in combination with each of the two speaker normalization methods, a different behavior was observed on the two corpora: in one case performance became very similar while in the other case the difference remained significant. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:107 / 123
页数:17
相关论文
共 27 条
  • [1] Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
  • [2] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS
    BAUM, LE
    PETRIE, T
    SOULES, G
    WEISS, N
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 164 - &
  • [3] SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES
    DIGALAKIS, VV
    RTISCHEV, D
    NEUMEYER, LG
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05): : 357 - 366
  • [4] Eide E, 1996, INT CONF ACOUST SPEE, P346, DOI 10.1109/ICASSP.1996.541103
  • [5] Maximum likelihood linear transformations for HMM-based speech recognition
    Gales, MJF
    [J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) : 75 - 98
  • [6] Cluster adaptive training of hidden Markov models
    Gales, MJF
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 417 - 428
  • [7] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
    Gauvain, Jean-Luc
    Lee, Chin-Hui
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02): : 291 - 298
  • [8] GILLICK L, 1989, P ICASP
  • [9] GIULIANI D, 2004, P INTERSPEECH JEJ IS, P2893
  • [10] GIULIANI D, 2003, P ICASSP