Combination of autocorrelation-based features and projection measure technique for speaker identification

被引:11
作者
Yuo, KH [1 ]
Hwang, TH [1 ]
Wang, HC [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu 300, Taiwan
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 04期
关键词
channel-normalization; relative autocorrelation sequence; projection measure; speaker identification;
D O I
10.1109/TSA.2005.848893
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a robust approach for speaker identification when the speech signal is corrupted by additive noise and channel distortion. Robust features are derived by assuming that the corrupting noise is stationary and the channel effect is fixed during an utterance. A two-step temporal filtering procedure on the autocorrelation sequence is proposed to minimize the effect of additive and convolutional noises. The first step applies a temporal filtering procedure in autocorrelation domain to remove the additive noise, and the second step is to perform the mean subtraction on the filtered autocorrelation sequence in logarithmic spectrum domain to remove the channel effect. No prior knowledge of noise characteristic is necessary. The additive noise can be a colored noise. Then the proposed robust feature is combined with the projection measure technique to gain further improvement in recognition accuracy. Experimental results show that the proposed method can significantly improve the performance of speaker identification task in noisy environment.
引用
收藏
页码:565 / 574
页数:10
相关论文
共 25 条
[11]  
HIRSCH HG, 1991, P EUROSPEECH 91 GEN, P4113
[12]   THE SHORT-TIME MODIFIED COHERENCE REPRESENTATION AND NOISY SPEECH RECOGNITION [J].
MANSOUR, D ;
JUANG, BH .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (06) :795-804
[13]   A FAMILY OF DISTORTION MEASURES BASED UPON PROJECTION OPERATION FOR ROBUST SPEECH RECOGNITION [J].
MANSOUR, D ;
BIING, HJ .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11) :1659-1671
[14]   Robust text-independent speaker identification over telephone channels [J].
Murthy, HA ;
Beaufays, F ;
Heck, LP ;
Weintraub, M .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (05) :554-568
[15]  
PARSSINEN K, P ICASSP 2002, P193
[16]   Estimation of handset nonlinearity with application to speaker recognition [J].
Quatieri, TF ;
Reynolds, DA ;
O'Leary, GC .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (05) :567-584
[17]  
REYNOLDS DA, 1996, P IEEE INT C AC SPEE, V1, P113
[18]   Maximum-likelihood approach to stochastic matching for robust speech recognition [J].
Sankar, A ;
Lee, CH .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (03) :190-202
[19]   Iterative noise and channel estimation under the stochastic matching algorithm framework [J].
Siohan, O ;
Lee, CH .
IEEE SIGNAL PROCESSING LETTERS, 1997, 4 (11) :304-306
[20]   Multiple speaker tracking and detection:: Handset normalization and duration scoring [J].
Sönmez, K ;
Heck, L ;
Weintraub, M .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :133-142