Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification

被引:12
|
作者
Yuo, KH [1 ]
Wang, HC [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu 30043, Taiwan
关键词
Karhunen-Loeve transform; transformation embedded GMM; generalized covariance matrices;
D O I
10.1016/S0167-6393(99)00017-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Karhunen-Loeve transform is a well-known technique for orthonormally mapping features into an uncorrelated space. The Gaussian mixture model (GMM) with diagonal covariance matrices is a popular technique for modeling the speech feature distributions. These two techniques can be combined to improve the performance of speaker or speech recognition systems. The drawback of the combination is that both set of parameters are not optimized together. This paper presents a new model structure that integrates both orthonormal transformation and diagonal-covariance Gaussian mixture into a unified framework. All parameters of this model are obtained simultaneously by Maximum Likelihood estimation. This idea is further extended to attain a new GMM with generalized covariance matrices (GC-GMM). The traditional GMM with diagonal or full covariance matrices is a special case of the GC-GMM. The proposed method is demonstrated on a 100-person connected digit database for text independent speaker identification. In comparison with the traditional GMM, the computational complexity and the number of parameters can be greatly reduced with no degradation in system performance. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:227 / 241
页数:15
相关论文
共 50 条
  • [1] JOINT MAP ADAPTATION OF FEATURE TRANSFORMATION AND GAUSSIAN MIXTURE MODEL FOR SPEAKER RECOGNITION
    Zhu, Donglai
    Ma, Bin
    Li, Haizhou
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4045 - 4048
  • [2] Optimization of Gaussian mixture model parameters for speaker identification
    Hong, QY
    Kwong, S
    Wang, HL
    GENETIC AND EVOLUTIONARY COMPUTATION GECCO 2004 , PT 2, PROCEEDINGS, 2004, 3103 : 1310 - 1311
  • [3] Speaker identification research based on gaussian mixture model
    Chunguang, Han
    Hua, Li
    Jia, Ding
    2007 International Symposium on Computer Science & Technology, Proceedings, 2007, : 702 - 705
  • [4] A genetic algorithm based method for optimisation of Gaussian mixture model parameters for speaker identification
    Mashao, DJ
    Tsai, CT
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL III, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING I, 2002, : 254 - 258
  • [5] Individual dimension Gaussian mixture model for speaker identification
    Wang, C
    Hou, LM
    Fang, Y
    ADVANCES IN BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2005, 3781 : 172 - 179
  • [6] Speaker verification based on adapted Gaussian mixture model feature mapping
    Department of Electronic Science and Technology, University of Science and Technology of China, Hefei 230027, China
    Moshi Shibie yu Rengong Zhineng, 2009, 3 (417-421):
  • [7] Text Independent Speaker Identification Using Gaussian Mixture Model
    Ting, Chee-Ming
    Salleh, Sh-Hussain
    Tan, Tian-Swee
    Ariff, A. K.
    ICIAS 2007: INTERNATIONAL CONFERENCE ON INTELLIGENT & ADVANCED SYSTEMS, VOLS 1-3, PROCEEDINGS, 2007, : 194 - 198
  • [8] Parameter settings for speaker identification using Gaussian mixture model
    Eskidere, Oe
    Ertas, F.
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 495 - +
  • [9] Speaker identification based on classify feature sub-space Gaussian mixture model and neural net fusion
    Huang, Wei
    Dai, Bei-Qian
    Li, Hui
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2004, 26 (10): : 1607 - 1612
  • [10] Fused Mel Feature sets based Text-Independent Speaker Identification using Gaussian Mixture Model
    Kumari, R. Shantha Selva
    Nidhyananthan, S. Selva
    Anand, G.
    INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND SYSTEM DESIGN 2011, 2012, 30 : 319 - 326