Compact Acoustic Models for Embedded Speech Recognition

被引：3

作者：

Levy, Christophe

Linares, Georges

Bonastre, Jean-Francois

机构：

[1] 84911 Avignon Cedex 9

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2009年

关键词：

Speech Recognition; Gaussian Component; Acoustic Model; Relative Gain; Subspace Cluster;

D O I：

10.1155/2009/806186

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model ( using related speaker recognition adaptation techniques) with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints. Copyright (C) 2009 Christophe Levy et al.

引用

页数：12

共 32 条

[1]

AUBERT X, 1995, INT CONF ACOUST SPEE, P49, DOI 10.1109/ICASSP.1995.479270

[2]

Bahl L., 1986, INT C ACOUSTICS SPEE, P49

[3]

BELLOT O, 2006, THESIS U AVIGNON CED

[4]

BILLI R, 1982, P INT C AC SPEECH SI, V7, P574

[5] Subspace distribution clustering hidden Markov model [J].

Bocchieri, E ;

Mak, BKW .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :264-275

[6]

CARRE R, 1984, P INT C AC SPEECH SI, V3, P324

[7] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[8]

Gales M.J. F., 1997, MAXIMUM LIKELIHOOD L

[9] ROBUST SPEECH RECOGNITION IN ADDITIVE AND CONVOLUTIONAL NOISE USING PARALLEL MODEL COMBINATION [J].

GALES, MJF ;

YOUNG, SJ .

COMPUTER SPEECH AND LANGUAGE, 1995, 9 (04) :289-307

[10] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

← 1 2 3 4 →