An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification

被引:67
作者
Lu, Xugang [1 ]
Dang, Jianwu [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Ishikawa 9231292, Japan
关键词
speaker identification; physiological features; speech production; fisher's F-ratio; mutual information; frequency warping;
D O I
10.1016/j.specom.2007.10.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The features used for speech recognition are expected to emphasize linguistic information while suppressing individual differences. For speaker recognition, in contrast, features should preserve individual information and attenuate the linguistic information at the same time. In most studies, however, identical acoustic features are used for the different missions of speaker and speech recognition. In this paper, we first investigated the relationships between the frequency components and the vocal tract based on speech production. We found that the individual information is encoded non-uniformly in different frequency bands of speech sound. Then we adopted statistical Fisher's F-ratio and information-theoretic mutual information measurements to measure the dependencies between frequency components and individual characteristics based on a speaker recognition database (NTT-VR). From the analysis, we not only confirmed the finding of non-uniform distribution of individual information in different frequency bands from the speech production point of view, but also quantified their dependencies. Based on the quantification results, we proposed a new physiological feature which emphasizes individual information for text-independent speaker identification by using a non-uniform subband processing strategy to emphasize the physiological information involved in speech production. The new feature was combined with GMM speaker models and applied to the NTT-VR speaker recognition database. The speaker identification using proposed feature reduced the identification error rate 20.1% compared that with MFCC feature. The experimental results confirmed that emphasizing the features from highly individual-dependent frequency bands is valid for improving speaker recognition performance. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:312 / 322
页数:11
相关论文
共 19 条
[1]  
[Anonymous], 2003, Statistical pattern recognition
[2]   AUTOMATIC RECOGNITION OF SPEAKERS FROM THEIR VOICES [J].
ATAL, BS .
PROCEEDINGS OF THE IEEE, 1976, 64 (04) :460-475
[3]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[4]  
Cover TM, 2006, Elements of Information Theory
[5]  
Dang J, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P965, DOI 10.1109/ICSLP.1996.607763
[6]   MORPHOLOGICAL AND ACOUSTICAL ANALYSIS OF THE NASAL AND THE PARANASAL CAVITIES [J].
DANG, JW ;
HONDA, K ;
SUZUKI, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 96 (04) :2088-2100
[7]   Acoustic characteristics of the piriform fossa in models and humans [J].
Dang, JW ;
Honda, K .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 101 (01) :456-465
[8]   Acoustic characteristics of the human paranasal sinuses derived from transmission characteristic measurement and morphological observation [J].
Dang, JW ;
Honda, K .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (05) :3374-3383
[9]  
HAYAKAWA S, 1995, P ICASSP1994
[10]  
HE J, 1995, P EUROSPEECH95 MADR, V1, P313