On Real Time Q-Log-based Feature Normalization for Distant Speech Recognition

被引:0
作者
Zilvan, Vicky [1 ]
Ni'mah, Iftitahu [1 ]
Yuliani, Asri R. [1 ]
Pardede, Hilman F. [1 ]
机构
[1] Indonesian Inst Sci LIPI, Res Ctr Informat, Bandung, Indonesia
来源
PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI) | 2016年
关键词
distant speech recognition; real time feature normalization; q-logarithm; non-Extensive statistics; reverberation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The computation of long term mean in feature normalization methods requires information on future frames, and thus makes them inapplicable for real-time implementations. Previously, q-log spectral mean normalization (q-LSMN) as feature normalization method is proposed, and it shows more effective result than conventional normalization methods. However, q-LSMN has not yet been implemented on real time. In this paper, we propose a real time implementation of q-LSMN. In this method, the mean is updated recursively based on only previous frames, hence no future frame information is needed. Experiments on Aurora-5 databases showed that while real time q-LSMN achieved slightly worse performance than non real time q-LSMN as expected, it improved the recognition accuracy up to 54.22% compared to that of non-real time conventional normalization methods such as cepstral mean normalization (CMN) and Log spectral mean normalization (LSMN).
引用
收藏
页数:5
相关论文
共 17 条
[1]  
Du J, 2014, INTERSPEECH, P616
[2]   Reverberation and Noise Robust Feature Compensation Based on IMM [J].
Han, Chang Woo ;
Kang, Shin Jae ;
Kim, Nam Soo .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (08) :1598-1611
[3]  
Ito Y., 2000, P INTERSPEECH, P530
[4]  
Jiao MK, 2015, 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), P1956, DOI 10.1109/FSKD.2015.7382248
[5]  
Kim C, 2012, INT CONF ACOUST SPEE, P4101, DOI 10.1109/ICASSP.2012.6288820
[6]  
Li Heling, 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), P1857, DOI 10.1109/CECNet.2012.6201972
[7]   Hard-Mask Missing Feature Theory for Robust Speaker Recognition [J].
Lim, Shin-Cheol ;
Jang, Sei-Jin ;
Lee, Soek-Pil ;
Kim, Moo Young .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (03) :1245-1250
[8]  
Pardede H. F., 2011, P INTERSPEECH, P1645
[9]  
Pardede HF, 2015, I S INTELL SIG PROC, P386, DOI 10.1109/ISPACS.2015.7432801
[10]   Feature normalization based on non-extensive statistics for speech recognition [J].
Pardede, Hilman F. ;
Iwano, Koji ;
Shinoda, Koichi .
SPEECH COMMUNICATION, 2013, 55 (05) :587-599