Discrimination power of vocal source and vocal tract related features for speaker segmentation

被引:20
作者
Chan, Wai Nang [1 ]
Zheng, Nengheng [1 ]
Lee, Tan [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Hong Kong, Peoples R China
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 06期
关键词
speaker discrimination power; speaker segmentation; vocal source features; vocal tract features;
D O I
10.1109/TASL.2007.900103
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents an analysis of the speaker discrimination power of vocal source related features, in comparison to the conventional vocal tract related features. The vocal source features, named wavelet octave coefficients of residues (WOCOR), are extracted by pitch-synchronous wavelet transform of the linear predictive (LP) residual signals. Using a series of controlled experiments, it is shown that WOCOR is less sensitive to spoken content than the conventional MFCC features and thus more discriminative when the amount of training data is limited. These advantages of WOCOR are exploited in the task of speaker segmentation for telephone conversation, in which statistical speaker models need to be built upon short speech segments. Experimental results show that the proposed use of WOCOR leads to noticeable reduction of segmentation errors.
引用
收藏
页码:1884 / 1892
页数:9
相关论文
共 31 条
[1]   A robust speaker clustering algorithm [J].
Ajmera, J ;
Wooters, C .
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, :411-416
[2]  
[Anonymous], 1993, Ten Lectures of Wavelets
[3]  
[Anonymous], 2004, P ICSLP
[4]  
[Anonymous], P FALL 2004 RICH TRA
[5]  
[Anonymous], P FALL 2004 RICH TRA
[6]  
[Anonymous], P I ACOUSTICS
[7]   AUTOMATIC SPEAKER RECOGNITION BASED ON PITCH CONTOURS [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 52 (06) :1687-1697
[8]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[9]  
Chan WN, 2006, INT CONF ACOUST SPEE, P657
[10]  
Chen S., 1998, In Proceedings of the Broadcast News Transcription and Understanding Workshop, V8, P127