SEMI-SUPERVISED MULTICHANNEL SPEECH ENHANCEMENT WITH VARIATIONAL AUTOENCODERS AND NON-NEGATIVE MATRIX FACTORIZATION

被引：0

作者：

Leglaive, Simon ^{[1
]}

Girin, Laurent ^{[1
,2
]}

Horaud, Radu ^{[1
]}

机构：

[1] Inria Grenoble Rhone Alpes, Montbonnot St Martin, France

[2] Univ Grenoble Alpes, Grenoble INP, GIPSA Lab, Grenoble, France

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Multichannel speech enhancement; local Gaussian modeling; variational autoencoders; non-negative matrix factorization; Monte Carlo expectation-maximization; AUDIO SOURCE SEPARATION; INFORMATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.

引用

页码：101 / 105

页数：5

共 32 条

[1]

[Anonymous], 2005, Monte Carlo statistical methods. Springer texts in statistics

[2]

[Anonymous], 1993, LINGUISTIC DATA CONS, DOI DOI 10.35111/17GK-BN40

[3]

Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570

[4]

Bando Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P716, DOI 10.1109/ICASSP.2018.8461530

[5]

Boyd Stephen P., 2014, Convex Optimization

[6] MONTE-CARLO EM ESTIMATION FOR TIME-SERIES MODELS INVOLVING COUNTS [J].

CHAN, KS ;

LEDOLTER, J .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (429) :242-252

[7] Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model [J].

Duong, Ngoc Q. K. ;

Vincent, Emmanuel ;

Gribonval, Remi .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1830-1840

[8] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].

Fevotte, Cedric ;

Bertin, Nancy ;

Durrieu, Jean-Louis .

NEURAL COMPUTATION, 2009, 21 (03) :793-830

[9]

Glorot X., 2010, P 13 INT C ART INT S, P249

[10] Jensen's operator inequality [J].

Hansen, F ;

Pedersen, GK .

BULLETIN OF THE LONDON MATHEMATICAL SOCIETY, 2003, 35 :553-564

← 1 2 3 4 →