A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

被引:13
作者
Pariente, Manuel [1 ]
Deleforge, Antoine [1 ]
Vincent, Emmanuel [1 ]
机构
[1] Univ Lorraine, INRIA, CNRS, LORIA, F-54000 Nancy, France
来源
INTERSPEECH 2019 | 2019年
关键词
Speech enhancement; variational autoencoders; variational Bayes; non-negative matrix factorization; NONNEGATIVE MATRIX FACTORIZATION;
D O I
10.21437/Interspeech.2019-1398
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent studies have explored the use of deep generative models of speech spectra based on variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the encoder of the pre-learned VAE can be used to estimate the variational approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the aforementioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance.
引用
收藏
页码:3158 / 3162
页数:5
相关论文
共 30 条
[1]  
[Anonymous], 2005, Monte Carlo statistical methods. Springer texts in statistics
[2]  
[Anonymous], 2010, MACHINE AUDITION PRI
[3]  
Bando Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P716, DOI 10.1109/ICASSP.2018.8461530
[4]  
Bishop C. M., 2006, PATTERN RECOGNITION, DOI DOI 10.1117/1.2819119
[5]   Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].
Fevotte, Cedric ;
Bertin, Nancy ;
Durrieu, Jean-Louis .
NEURAL COMPUTATION, 2009, 21 (03) :793-830
[6]  
Garofolo J.S., 1993, Timit acoustic phonetic continuous speech corpus
[7]  
Hershey JR, 2016, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2016.7471631
[8]  
Kameoka H., 2018, ARXIV PREPRINT ARXIV
[9]  
Kingma D., 2014, 14126980 ARXIV
[10]  
Kingma DP, 2014, ADV NEUR IN, V27