Training and compensation of class-conditioned NMF bases for speech enhancement

被引：6

作者：

Chung, Hanwook ^{[1
]}

Badeau, Roland ^{[2
]}

Plourde, Eric ^{[3
]}

Champagne, Benoit ^{[1
]}

机构：

[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada

[2] Univ Paris Saclay, Telecom ParisTech, LTCI, F-75013 Paris, France

[3] Sherbrooke Univ, Dept Elect & Comp Engn, Sherbrooke, PQ, Canada

来源：

NEUROCOMPUTING | 2018年 / 284卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Single-channel speech enhancement; Non-negative matrix factorization; Probabilistic generative model; Classification; Variational Bayesian expectation-maximization; NONNEGATIVE MATRIX FACTORIZATION; SOURCE SEPARATION; NOISE; MIXTURES; ALGORITHMS; SPARSE; MODEL;

D O I：

10.1016/j.neucom.2018.01.013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we introduce a training and compensation algorithm of the class-conditioned basis vectors in the non-negative matrix factorization (NMF) model for single-channel speech enhancement. The main goal is to estimate the basis vectors of different signal sources in a way that prevents them from representing each other, in order to reduce the residual noise components that have features similar to the speech signal. During the proposed training stage, the basis matrices for the clean speech and noises are estimated jointly by constraining them to belong to different classes. To this end, we employ the probabilistic generative model (PGM) of classification, specified by class-conditional densities, as an a priori distribution for the basis vectors. The update rules of the NMF and the PGM parameters of classification are jointly obtained by using the variational Bayesian expectation-maximization (VBEM) algorithm, which guarantees convergence to a stationary point. Another goal of the proposed algorithm is to handle a mismatch between the characteristics of the training and test data. This is accomplished during the proposed enhancement stage, where we implement a basis compensation scheme. Specifically, we use extra free basis vectors to capture the features that are not included in the training data. Objective experimental results for different combination of speaker and noise types show that the proposed algorithm can provide better speech enhancement performance than the benchmark algorithms under various conditions. (C) 2018 Elsevier B.V. All rights reserved.

引用

页码：107 / 118

页数：12

共 44 条

[1]

[Anonymous], 2001, PERCEPTUAL EVALUATIO, P862

[2] Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain [J].

Badeau, Roland ;

Plumbley, Mark D. .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (11) :1670-1680

[3] Stability Analysis of Multiplicative Update Algorithms and Application to Nonnegative Matrix Factorization [J].

Badeau, Roland ;

Bertin, Nancy ;

Vincent, Emmanuel .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (12) :1869-1881

[4] Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription [J].

Bertin, Nancy ;

Badeau, Roland ;

Vincent, Emmanuel .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :538-549

[5]

Bishop Christopher M, 2016, Pattern recognition and machine learning

[6]

Cemgil Ali Taylan, 2009, Comput Intell Neurosci, P785152, DOI 10.1155/2009/785152

[7] Bayesian Factorization and Learning for Monaural Source Separation [J].

Chien, Jen-Tzung ;

Yang, Po-Kai .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) :185-195

[8] Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement [J].

Chung, Hanwook ;

Plourde, Eric ;

Champagne, Benoit .

SPEECH COMMUNICATION, 2017, 87 :18-30

[9]

Chung H, 2016, INT CONF ACOUST SPEE, P2249, DOI 10.1109/ICASSP.2016.7472077

[10] Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement [J].

Chung, Hanwook ;

Plourde, Eric ;

Champagne, Benoit .

IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (04) :502-506

← 1 2 3 4 5 →