ROTATIONAL RESET STRATEGY FOR ONLINE SEMI-SUPERVISED NMF-BASED SPEECH ENHANCEMENT FOR LONG RECORDINGS

被引:0
作者
Zhou, Jun [1 ]
Chen, Shuo [2 ]
Duan, Zhiyao [2 ]
机构
[1] Southwest Univ, Dept Comp Sci, Chongqing 400715, Peoples R China
[2] Univ Rochester, Dept Elect & Comp Engn, Rochester, NY 14627 USA
来源
2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) | 2015年
关键词
Speech enhancement; non-stationary noise; non-negative matrix factorization; source separation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Non-negative matrix factorization (NMF) has been successfully applied to speech enhancement in non-stationary noisy environments. Recently proposed online semi-supervised NMF algorithms are of particular interest as they carry the two nice properties (online and semi-supervised) of classical speech enhancement approaches. These algorithms, however, have only been evaluated using noisy mixtures shorter than 30 seconds. In this paper we find that these algorithms work well when it is run for less than 1 minute, but degradation of the enhanced speech signal starts to appear after 2 minutes. We analyze that the reason is due to the inappropriate dictionary update rule, which gradually loses its ability in updating the speech dictionary. We then propose a simple rotational reset strategy to solve the problem: Instead of continuously updating the entire speech dictionary, we periodically and rotationally select elements and reset their values to random numbers. Experiments show that this strategy successfully solves the degradation problem and the improved algorithm outperforms classical speech enhancement algorithms significantly even when they are run for 10 minutes.
引用
收藏
页数:5
相关论文
共 21 条
[1]  
[Anonymous], 2013, COMPUT REV
[2]  
[Anonymous], P INTERSPEECH
[3]  
[Anonymous], INTERSPEECH
[4]   Perceptual evaluation of blind source separation for robust speech recognition [J].
Di Persia, Leandro ;
Milone, Diego ;
Rufiner, Hugo Leonardo ;
Yanagida, Masuzo .
SIGNAL PROCESSING, 2008, 88 (10) :2578-2583
[5]  
Duan Zhiyao, 2012, P INTERSPEECH
[6]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445
[7]   Stopping Criteria for Non-Negative Matrix Factorization Based Supervised and Semi-Supervised Source Separation [J].
Germain, Franois G. ;
Mysore, Gautham J. .
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (10) :1284-1288
[8]   TRANSDUCTIVE NONNEGATIVE MATRIX FACTORIZATION FOR SEMI-SUPERVISED HIGH-PERFORMANCE SPEECH SEPARATION [J].
Guan, Naiyang ;
Lan, Long ;
Tao, Dacheng ;
Luo, Zhigang ;
Yang, Xuejun .
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9]   Noise tracking using DFT domain subspace decompositions [J].
Hendriks, Richard C. ;
Jensen, Jesper ;
Heusdens, Richard .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03) :541-553
[10]   MMSE BASED NOISE PSD TRACKING WITH LOW COMPLEXITY [J].
Hendriks, Richard C. ;
Heusdens, Richard ;
Jensen, Jesper .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4266-4269