SEMI-SUPERVISED MULTICHANNEL SPEECH ENHANCEMENT WITH VARIATIONAL AUTOENCODERS AND NON-NEGATIVE MATRIX FACTORIZATION

被引:0
作者
Leglaive, Simon [1 ]
Girin, Laurent [1 ,2 ]
Horaud, Radu [1 ]
机构
[1] Inria Grenoble Rhone Alpes, Montbonnot St Martin, France
[2] Univ Grenoble Alpes, Grenoble INP, GIPSA Lab, Grenoble, France
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Multichannel speech enhancement; local Gaussian modeling; variational autoencoders; non-negative matrix factorization; Monte Carlo expectation-maximization; AUDIO SOURCE SEPARATION; INFORMATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.
引用
收藏
页码:101 / 105
页数:5
相关论文
共 32 条
[1]  
[Anonymous], 2005, Monte Carlo statistical methods. Springer texts in statistics
[2]  
[Anonymous], 1993, LINGUISTIC DATA CONS, DOI DOI 10.35111/17GK-BN40
[3]  
Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570
[4]  
Bando Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P716, DOI 10.1109/ICASSP.2018.8461530
[5]  
Boyd Stephen P., 2014, Convex Optimization
[6]   MONTE-CARLO EM ESTIMATION FOR TIME-SERIES MODELS INVOLVING COUNTS [J].
CHAN, KS ;
LEDOLTER, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (429) :242-252
[7]   Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model [J].
Duong, Ngoc Q. K. ;
Vincent, Emmanuel ;
Gribonval, Remi .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1830-1840
[8]   Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].
Fevotte, Cedric ;
Bertin, Nancy ;
Durrieu, Jean-Louis .
NEURAL COMPUTATION, 2009, 21 (03) :793-830
[9]  
Glorot X., 2010, P 13 INT C ART INT S, P249
[10]   Jensen's operator inequality [J].
Hansen, F ;
Pedersen, GK .
BULLETIN OF THE LONDON MATHEMATICAL SOCIETY, 2003, 35 :553-564