ROBUST SPEAKER IDENTIFICATION USING AN AUDITORY-BASED FEATURE

被引:28
作者
Li, Qi [1 ]
Huang, Yan [1 ]
机构
[1] Li Creat Technol LcT Inc, Florham Pk, NJ 07932 USA
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Speech feature extraction; auditory-based feature; robust speaker recognition; speaker identification; cochlea; FILTER SHAPES; NOISE;
D O I
10.1109/ICASSP.2010.5495589
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
An auditory-based feature extraction algorithm is presented. The feature is based on a recently published time-frequency transform plus a set of modules to simulate the signal processing functions in the cochlea. The feature is applied to a speaker identification task to address the acoustic mismatch problem between training and testing. Usually, the performances of acoustic models trained in clean speech drop significantly when tested on noisy speech. The proposed feature has shown strong robustness in the mismatched situation. As shown in our experiments, in a speaker identification task, both MFCC and the proposed feature have near perfect performances in a clean testing condition, but when the SNR of input signal drops to 6 dB, the average accuracy of the MFCC feature is only 41.2%, while the proposed feature still achieves an average accuracy of 88.3%.
引用
收藏
页码:4514 / 4517
页数:4
相关论文
共 24 条
[1]  
[Anonymous], ITF4 SBIR NSF LI CRE
[2]  
[Anonymous], P IEEE WORKSH APPL S
[3]  
[Anonymous], P 6 INT C AC TOKY
[4]  
[Anonymous], 1960, Experiments in Hearing
[5]  
[Anonymous], 1972, IPO S HEAR THEOR EIN
[6]  
[Anonymous], 1965, Discharge Patterns of Single Fibers in the Cat's Auditory Nerve
[7]  
[Anonymous], HEARING THEORY
[8]   Contrast tuning in auditory cortex [J].
Barbour, DL ;
Wang, XQ .
SCIENCE, 2003, 299 (5609) :1073-1075
[9]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[10]  
Flanagan J. L., 1972, SPEECH ANAL SYNTHESI