Compact Acoustic Models for Embedded Speech Recognition

被引:0
作者
Christophe Lévy
Georges Linarès
Jean-François Bonastre
机构
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2009卷
关键词
Speech Recognition; Gaussian Component; Acoustic Model; Relative Gain; Subspace Cluster;
D O I
暂无
中图分类号
学科分类号
摘要
Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model (using related speaker recognition adaptation techniques) with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints.
引用
收藏
相关论文
共 49 条
  • [1] Shore JE(1983)Discrete utterance speech recognition without time alignment IEEE Transactions on Information Theory 29 473-491
  • [2] Burton DK(1982)Vector quantization and Markov source models applied to speech recognition Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '82) 7 574-577
  • [3] Billi R(2001)Subspace distribution clustering hidden Markov model IEEE Transactions on Speech and Audio Processing 9 264-275
  • [4] Bocchieri E(1993)Shared-distribution hidden Markov models for speech recognition IEEE Transactions on Speech and Audio Processing 1 414-420
  • [5] Mak BK-W(1996)Deleted interpolation and density sharing for continuous hidden Markov models Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '96) 2 885-888
  • [6] Hwang M-Y(2006)Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling Speech Communication 48 737-745
  • [7] Huang X(2006)Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '06) 1 185-188
  • [8] Huang XD(1984)The French language database: defining, planning and recording a large database Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '84) 3 324-327
  • [9] Hwang M-Y(1990)Perceptual linear predictive (PLP) analysis of speech Journal of the Acoustical Society of America 87 1738-1752
  • [10] Jiang L(2005)Acoustic feature combination for robust speech recognition Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '05) 1 457-460