DISCRIMINATIVE NON-NEGATIVE MATRIX FACTORIZATION WITH MAJORIZATION-MINIMIZATION

被引：0

作者：

Li, Li ^{[1
]}

Kameoka, Hirokazu ^{[2
]}

Makino, Shoji ^{[1
]}

机构：

[1] Univ Tsukuba, Tsukuba, Ibaraki, Japan

[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan

来源：

2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017) | 2017年

关键词：

Discriminative non-negative matrix factorization; majorization-minimization; single channel; speech enhancement;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Non-negative matrix factorization (NMF) is a powerful approach to single channel audio source separation. In a supervised setting, NMF is first applied to train the basis spectra of each sound source. At test time, NMF is applied to the spectrogram of a mixture signal using the pretrained spectra. The source signals can then be separated out using a Wiener filter. A typical way to train the basis spectra of each source is to minimize the objective function of NMF. However, the basis spectra obtained in this way do not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this, a framework called discriminative NMF (DNMF) has recently been proposed. In in this work a multiplicative update algorithm was proposed for the basis training, however one drawback is that the convergence is not guaranteed. To overcome this drawback, this paper proposes using a majorization-minimization principle to develop a convergence-guaranteed algorithm for DNMF. Experimental results showed that the proposed algorithm outperformed standard NMF and DNMF using a multiplicative update algorithm as regards both the signal-to-distortion and signal-to-interference ratios.

引用

页码：141 / 145

页数：5

共 15 条

[1]

[Anonymous], 2012, P ISMIR

[2] Learning a Discriminative Dictionary for Single-Channel Speech Separation [J].

Bao, Guangzhao ;

Xu, Yangfei ;

Ye, Zhongfu .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) :1130-1138

[3] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[4]

Eggert J, 2004, IEEE IJCNN, P2529

[5] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].

Fevotte, Cedric ;

Bertin, Nancy ;

Durrieu, Jean-Louis .

NEURAL COMPUTATION, 2009, 21 (03) :793-830

[6]

Grais EM, 2013, INTERSPEECH, P808

[7] ATR JAPANESE SPEECH DATABASE AS A TOOL OF SPEECH RECOGNITION AND SYNTHESIS [J].

KUREMATSU, A ;

TAKEDA, K ;

SAGISAKA, Y ;

KATAGIRI, S ;

KUWABARA, H ;

SHIKANO, K .

SPEECH COMMUNICATION, 1990, 9 (04) :357-363

[8] Target Source Separation Based on Discriminative Nonnegative Matrix Factorization Incorporating Cross-Reconstruction Error [J].

Kwon, Kisoo ;

Shin, Jong Won ;

Kim, Nam Soo .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (11) :2017-2020

[9]

Lee DD, 2001, ADV NEUR IN, V13, P556

[10]

Nakano Masahiro, 2010, Proceedings of the 2010 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), P283, DOI 10.1109/MLSP.2010.5589233

← 1 2 →