SPEAKER IDENTIFICATION BY AGGREGATING GAUSSIAN MIXTURE MODELS (GMMs) BASED ON UNCORRELATED MFCC-DERIVED FEATURES

被引：4

作者：

Pal, Amita ^{[1
]}

Bose, Smarajit ^{[1
]}

Basak, Gopal K. ^{[2
]}

Mukhopadhyay, Amitava ^{[3
]}

机构：

[1] Indian Stat Inst, Appl Stat Div, Kolkata, India

[2] Indian Stat Inst, Theoret Stat & Math Div, Kolkata, India

[3] Interra Informat Technol, Kolkata, India

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2014年 / 28卷 / 04期

关键词：

Mel frequency cepstral coefficients; Gaussian mixture models; principal component transformation; ensemble classification; classification accuracy; NTIMIT; RECOGNITION; CLASSIFIERS; PCA;

D O I：

10.1142/S0218001414560060

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett. 2 (1995) 46-48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.

引用

页数：25

共 30 条

[1] Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence [J].