Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization

被引:0
作者
Qiang Wu
Li-Qing Zhang
Guang-Chuan Shi
机构
[1] Shanghai Jiao Tong University,Department of Computer Science and Engineering
来源
Journal of Computer Science and Technology | 2010年 / 25卷
关键词
pattern recognition; speaker recognition; nonnegative tensor factorization; feature extraction; auditory perception;
D O I
暂无
中图分类号
学科分类号
摘要
How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor. cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment.
引用
收藏
页码:783 / 792
页数:9
相关论文
共 65 条
[1]  
Hermansky H(1990)Perceptual linear predictive (PLP) analysis of speech The Journal of the Acoustical Society of America 87 1738-1752
[2]  
Reynolds DA(1995)Robust text-independent speaker identification using Gaussian mixture speaker models IEEE Trans. Speech and Audio Processing 3 72-83
[3]  
Rose RC(1994)RASTA processing of speech IEEE Trans. Speech and Audio Processing 2 578-589
[4]  
Hermansky H(1994)Experimental evaluation of features for robust speaker identification IEEE Trans. Speech and Audio Processing 2 639-643
[5]  
Morgan N(1996)Robust speaker recognition: A feature-based approach IEEE Signal Process. Mag 13 58-71
[6]  
Reynolds DA(2006)A two-stage algorithm for onemicrophone reverberant speech enhancement IEEE Transactions on Speech and Audio Processing 14 774-784
[7]  
Mammone R(2007)A review of signal subspace speech enhancement and its application to noise robust speech recognition EURASIP Journal on Applied Signal Processing 1 195-209
[8]  
Zhang X(1995)Spectral shape analysis in the central auditory system IEEE Transactions on Speech and Audio Processing 3 382-395
[9]  
Ramachandran RP(1992)Auditory representation of acoustic signals IEEE Trans. Information Theory 38 824-839
[10]  
Wu MY(2006)Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations IEEE Trans. Audio, Speech, and Language Processing 14 920-930