Transductive Convolutive Nonnegative Matrix Factorization for Speech Separation

被引：0

作者：

Mai, Yaodan

Lan, Long

Guan, Naiyang

Zhang, Xiang

Luo, Zhigang ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China

来源：

PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015) | 2015年

基金：

高等学校博士学科点专项科研基金;

关键词：

Nonnegative matrix factorization; single-channel speech separation; transductive convolutive dictionary learning; speech separation;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Nonnegative matrix factorization (NMF) is an effective speech separation approach of extracting discriminative components of different speaker. However, traditional NMF focuses only on the additive combination of the components and ignores the dependencies of speeches. Convolutive NMF (CNMF) captures the dependencies of speeches by overlapping components and achieves better separation performance. NMF and CNMF learn dictionaries for speakers in the absence of mixture, and thus they are unable to get enough information to learn dictionaries accurately when testing speeches are available. To handle this problem, transductive NMF (TNMF) is proposed which simultaneously utilizes speech of each speaker and mixture to learn more meaningful features of speakers, and significantly boost speech separation. CNMF addresses the dependencies of speech signals while it ignores the positive effect of mixtures in learning dictionaries. TNMF emphasizes the transductive learning of dictionaries while it fails to consider dependencies of speeches. This paper proposes transductive convolutive NMF (TCNMF) to overcome the deficiencies of both CNMF and TNMF. Experimental results show that our method makes significant improvement compared to aforementioned NMF-based methods.

引用

页码：1400 / 1404

页数：5

共 18 条

[1] An audio-visual corpus for speech perception and automatic speech recognition (L)
Cooke, Martin
Barker, Jon
Cunningham, Stuart
Shao, Xu
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) : 2421 - 2424
[2] Duan Zhiyao., 2012, INTERSPEECH
[3] Grais E. M., 2011, 2011 17th International Conference on Digital Signal Processing, P1
[4] TRANSDUCTIVE NONNEGATIVE MATRIX FACTORIZATION FOR SEMI-SUPERVISED HIGH-PERFORMANCE SPEECH SEPARATION
Guan, Naiyang
Lan, Long
Tao, Dacheng
Luo, Zhigang
Yang, Xuejun
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[5] Non-negative Patch Alignment Framework
Guan, Naiyang
Tao, Dacheng
Luo, Zhigang
Yuan, Bo
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (08): : 1218 - 1230
[6] Manifold Regularized Discriminative Nonnegative Matrix Factorization With Fast Gradient Descent
Guan, Naiyang
Tao, Dacheng
Luo, Zhigang
Yuan, Bo
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2011, 20 (07) : 2030 - 2048
[7] NMF With Time-Frequency Activations to Model Nonstationary Audio Events
Hennequin, Romain
Badeau, Roland
David, Bertrand
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 744 - 753
[8] Joder Cyril, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P322, DOI 10.1007/978-3-642-28551-6_40
[9] King B.A., 2012, 2012 Dallas, Texas, July 29-August 1, 2012, P1
[10] Learning the parts of objects by non-negative matrix factorization
Lee, DD
Seung, HS
[J]. NATURE, 1999, 401 (6755) : 788 - 791

← 1 2 →