Noise subspace fuzzy C-means clustering for robust speech recognition

被引:0
作者
Gorriz, J. M. [1 ]
Ramirez, J.
Segura, J. C.
Puntonet, C. G.
Gonzalez, J. J.
机构
[1] Univ Granada, Dpt Signal Theory Networking & Commun, E-18071 Granada, Spain
[2] Univ Granada, Dpt Comp Architecture & Technol, E-18071 Granada, Spain
来源
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 5 | 2006年 / 3984卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper a fuzzy C-means (FCM) based approach for speech/non-speech discrimination is developed to build an effective voice activity detection (VAD) algorithm. The proposed VAD method is based on a soft-decision clustering approach built over a ratio of subband energies that improves recognition performance in noisy environments. The accuracy of the FCM-VAD algorithm lies in the use of a decision function defined over a multiple-observation (MO) window of averaged subband energy ratio and the modeling of noise subspace into fuzzy prototypes. In addition, time efficiency is also reached due to the clustering approach which is fundamental in VAD real time applications, i.e. speech recognition. An exhaustive analysis on the Spanish SpeechDat-Car databases is conducted in order to assess the performance of the proposed method and to compare it to existing standard VAD methods. The results show improvements in detection accuracy over standard VADs and a representative set of recently reported VAD algorithms.
引用
收藏
页码:772 / 779
页数:8
相关论文
共 15 条
[1]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[2]  
[Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
[3]  
BOUQUINJEANNES RL, 1995, COMMUNICATION, V16, P254
[4]  
Dunn J. C., 1973, Journal of Cybernetics, V3, P32, DOI 10.1080/01969727308546046
[5]  
*ETSI, 1999, 301 708 ETSI EN
[6]  
*ITU, 1996, G729ANNEXB ITU
[7]  
Jain AK, 1988, PRENTICE HALL ADV RE
[8]   Robust endpoint detection and energy normalization for real-time speech and speaker recognition [J].
Li, Q ;
Zheng, JS ;
Tsai, A ;
Zhou, QR .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03) :146-157
[9]  
MARZINZIK M, 2002, IEEE T SPEECH AUDIO, V10, P341, DOI DOI 10.1109/TSA.2002.803420
[10]  
MORENO A, 2000, P 2 LREC C