Text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution

被引:0
作者
Miyajima, C [1 ]
Hattori, Y
Tokuda, K
Masuko, T
Kobayashi, T
Kitamura, T
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 4668555, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Dept Informat Proc, Yokohama, Kanagawa 2268502, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2001年 / E84D卷 / 07期
关键词
speaker identification; pitch; multi-space probability distribution; Gaussian mixture model; minimum classification error;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.
引用
收藏
页码:847 / 855
页数:9
相关论文
共 50 条