Transductive Convolutive Nonnegative Matrix Factorization for Speech Separation

被引:0
作者
Mai, Yaodan
Lan, Long
Guan, Naiyang
Zhang, Xiang
Luo, Zhigang [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
来源
PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015) | 2015年
基金
高等学校博士学科点专项科研基金;
关键词
Nonnegative matrix factorization; single-channel speech separation; transductive convolutive dictionary learning; speech separation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nonnegative matrix factorization (NMF) is an effective speech separation approach of extracting discriminative components of different speaker. However, traditional NMF focuses only on the additive combination of the components and ignores the dependencies of speeches. Convolutive NMF (CNMF) captures the dependencies of speeches by overlapping components and achieves better separation performance. NMF and CNMF learn dictionaries for speakers in the absence of mixture, and thus they are unable to get enough information to learn dictionaries accurately when testing speeches are available. To handle this problem, transductive NMF (TNMF) is proposed which simultaneously utilizes speech of each speaker and mixture to learn more meaningful features of speakers, and significantly boost speech separation. CNMF addresses the dependencies of speech signals while it ignores the positive effect of mixtures in learning dictionaries. TNMF emphasizes the transductive learning of dictionaries while it fails to consider dependencies of speeches. This paper proposes transductive convolutive NMF (TCNMF) to overcome the deficiencies of both CNMF and TNMF. Experimental results show that our method makes significant improvement compared to aforementioned NMF-based methods.
引用
收藏
页码:1400 / 1404
页数:5
相关论文
共 18 条
  • [1] An audio-visual corpus for speech perception and automatic speech recognition (L)
    Cooke, Martin
    Barker, Jon
    Cunningham, Stuart
    Shao, Xu
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) : 2421 - 2424
  • [2] Duan Zhiyao., 2012, INTERSPEECH
  • [3] Grais E. M., 2011, 2011 17th International Conference on Digital Signal Processing, P1
  • [4] TRANSDUCTIVE NONNEGATIVE MATRIX FACTORIZATION FOR SEMI-SUPERVISED HIGH-PERFORMANCE SPEECH SEPARATION
    Guan, Naiyang
    Lan, Long
    Tao, Dacheng
    Luo, Zhigang
    Yang, Xuejun
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Non-negative Patch Alignment Framework
    Guan, Naiyang
    Tao, Dacheng
    Luo, Zhigang
    Yuan, Bo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (08): : 1218 - 1230
  • [6] Manifold Regularized Discriminative Nonnegative Matrix Factorization With Fast Gradient Descent
    Guan, Naiyang
    Tao, Dacheng
    Luo, Zhigang
    Yuan, Bo
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2011, 20 (07) : 2030 - 2048
  • [7] NMF With Time-Frequency Activations to Model Nonstationary Audio Events
    Hennequin, Romain
    Badeau, Roland
    David, Bertrand
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 744 - 753
  • [8] Joder Cyril, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P322, DOI 10.1007/978-3-642-28551-6_40
  • [9] King B.A., 2012, 2012 Dallas, Texas, July 29-August 1, 2012, P1
  • [10] Learning the parts of objects by non-negative matrix factorization
    Lee, DD
    Seung, HS
    [J]. NATURE, 1999, 401 (6755) : 788 - 791