Text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution

被引:0
作者
Miyajima, C [1 ]
Hattori, Y
Tokuda, K
Masuko, T
Kobayashi, T
Kitamura, T
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 4668555, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Dept Informat Proc, Yokohama, Kanagawa 2268502, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2001年 / E84D卷 / 07期
关键词
speaker identification; pitch; multi-space probability distribution; Gaussian mixture model; minimum classification error;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.
引用
收藏
页码:847 / 855
页数:9
相关论文
共 50 条
  • [31] Text-independent speaker identification system based on the histogram of DCT-cepstrum coefficients
    Al-Rawahy, S.
    Hossen, A.
    Heute, U.
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2012, 16 (03) : 141 - 161
  • [32] Text-independent speaker identification using soft channel selection in home robot environments
    Ji, Mikyong
    Kim, Sungtak
    Kim, Hoirin
    Yoon, Ho-Sub
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (01) : 140 - 144
  • [33] A high-performance text-independent speaker identification of Arabic speakers using a CHMM-based approach
    Tolba, Hesham
    ALEXANDRIA ENGINEERING JOURNAL, 2011, 50 (01) : 43 - 47
  • [34] Closed-Set Text-Independent Speaker Identification System Using Multiple ANN Classifiers
    Dutta, Munmi
    Patgiri, Chayashree
    Sarma, Mousmita
    Sarma, Kandarpa Kumar
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 377 - 385
  • [35] Text-independent speaker identification system using discrete wavelet transform with linear prediction coding
    Othman Alrusaini
    Khaled Daqrouq
    Journal of Umm Al-Qura University for Engineering and Architecture, 2024, 15 (2): : 112 - 119
  • [36] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
    Shome, Nirupam
    Saritha, Banala
    Kashyap, Richik
    Laskar, Rabul Hussain
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (26) : 18933 - 18947
  • [37] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
    Nirupam Shome
    Banala Saritha
    Richik Kashyap
    Rabul Hussain Laskar
    Neural Computing and Applications, 2023, 35 : 18933 - 18947
  • [38] Privacy-Preserving Speaker Verification and Identification Using Gaussian Mixture Models
    Pathak, Manas A.
    Raj, Bhiksha
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 397 - 406
  • [39] Speaker identification using a novel combination of sparse representation and Gaussian mixture models
    Ma Yunjie
    AUTOMATIC CONTROL AND MECHATRONIC ENGINEERING III, 2014, 615 : 265 - 269
  • [40] Text-Independent Speaker Identification Using Vocal Tract Length Normalization for Building Universal Background Model
    Sarkar, A. K.
    Umesh, S.
    Rath, S. P.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2311 - 2314