Skew Gaussian mixture models for speaker recognition

被引：13

作者：

Matza, Avi ^{[1
]}

Bistritz, Yuval ^{[1
]}

机构：

[1] Tel Aviv Univ, Sch Elect Engn, IL-69978 Tel Aviv, Israel

来源：

IET SIGNAL PROCESSING | 2014年 / 8卷 / 08期

关键词：

Gaussian processes; mixture models; speaker recognition; vectors; expectation-maximisation algorithm; GMM; speech recognition; skew empirical distribution; expectation maximisation algorithm; EM algorithm; two-piece skew Gaussian mixture model; Mel frequency cepstral coefflcient; MFCC; line spectral frequency; LSF; immittance spectral frequency; ISF; speech transmission standard; feature vectors; DISTRIBUTIONS;

D O I：

10.1049/iet-spr.2013.0270

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Gaussian mixture models (GMMs) are widely used in speech and speaker recognition. This study explores the idea that a mixture of skew Gaussians might capture better feature vectors that tend to have skew empirical distributions. It begins with deriving an expectation maximisation (EM) algorithm to train a mixture of two-piece skew Gaussians that turns out to be not much more complicated than the usual EM algorithm used to train symmetric GMMs. Next, the algorithm is used to compare skew and symmetric GMMs in some simple speaker recognition experiments that use Mel frequency cepstral coefficients (MFCC) and line spectral frequencies (LSF) as the feature vectors. MFCC are one of the most popular feature vectors in speech and speaker recognition applications. LSF were chosen because they exhibit significantly more skewed distribution than MFCC and because they are widely used [together with the related immittance spectral frequencies (ISF)] in speech transmission standards. In the reported experiments, models with skew Gaussians performed better than models with symmetric Gaussians and skew GMMs with LSF compared favourably with both skew symmetric and symmetric GMMs that used MFCC.

引用

页码：860 / 867

页数：8

共 12 条

[1] Statistical inference for a general class of asymmetric distributions [J].

Arellano-Valle, RB ;

Gómez, HW ;

Quintana, FA .

JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2005, 128 (02) :427-443

[2]

AZZALINI A, 1985, SCAND J STAT, V12, P171

[3]

Azzalini A., 1986, STATISTICA, V46, P199

[4]

Bilmes J. A., 1998, TR97021 EECS UC BERK

[5] 2ND-ORDER STATISTICAL MEASURES FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION [J].

BIMBOT, F ;

MAGRINCHAGNOLLEAU, I ;

MATHAN, L .

SPEECH COMMUNICATION, 1995, 17 (1-2) :177-192

[6] ESTIMATION OF IMPURITY PROFILES IN ION-IMPLANTED AMORPHOUS TARGETS USING JOINED HALF-GAUSSIAN DISTRIBUTIONS [J].

GIBBONS, JF ;

MYLROIE, S .

APPLIED PHYSICS LETTERS, 1973, 22 (11) :568-569

[7] Wideband speech coding advances in VMR-WB standard [J].

Jelinek, Milan ;

Salami, Redwan .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1167-1179

[8] THE 3-PARAMETER 2-PIECE NORMAL FAMILY OF DISTRIBUTIONS AND ITS FITTING [J].

JOHN, S .

COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1982, 11 (08) :879-885

[9]

Kleijn W.B., 1995, Speech Coding and Synthesis

[10]

MCLACHLAN G, 2000, WILEY SER PROB STAT, P1, DOI 10.1002/0471721182

← 1 2 →