Deep Transductive Nonnegative Matrix Factorization for Speech Separation

被引:4
作者
Liu, Yalin [1 ]
Guan, Naiyang [1 ]
Liu, Jie [1 ]
机构
[1] Natl Univ Def Technol, Inst Software, Sch Comp Sci, Changsha 410073, Hunan, Peoples R China
来源
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2017年
关键词
nonnegative matrix factorization; deep learning; transductive learning; speech separation; RATIO;
D O I
10.1109/ICMLA.2017.0-151
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Non-negative matrix factorization (NMF) has attracted great attentions in speech separation as it can preserve the non-negativity property of the magnitude spectrogram of speech signal. However, NMF sometimes performs poorly because it cannot extract the non-linear features in speech. In this paper, we propose a deep transductive NMF model (DTNMF) which incorporates a multi-layer structure into NMF and learns a shared dictionary on source signal of each speaker and the mixture signal to be separated. Since the multi-layer structure enables DTNMF to learn more precise presentation of source signal with the non-linear features extracted, DTNMF significantly enhances the performance of speech separation. Experimental results on the popular LibriSpeech dataset show that DTNMF outperforms the representative NMF models for separating the mixture of single-channel speech signals.
引用
收藏
页码:249 / 254
页数:6
相关论文
共 18 条
[1]  
Bouvier D, 2016, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2016.7471651
[2]   Convex and Semi-Nonnegative Matrix Factorizations [J].
Ding, Chris ;
Li, Tao ;
Jordan, Michael I. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (01) :45-55
[3]  
Févotte C, 2011, INT CONF ACOUST SPEE, P1980
[4]   SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243
[5]   TRANSDUCTIVE NONNEGATIVE MATRIX FACTORIZATION FOR SEMI-SUPERVISED HIGH-PERFORMANCE SPEECH SEPARATION [J].
Guan, Naiyang ;
Lan, Long ;
Tao, Dacheng ;
Luo, Zhigang ;
Yang, Xuejun .
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[6]   Non-negative Patch Alignment Framework [J].
Guan, Naiyang ;
Tao, Dacheng ;
Luo, Zhigang ;
Yuan, Bo .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (08) :1218-1230
[7]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[8]   Discriminative Layered Nonnegative Matrix Factorization for Speech Separation [J].
Hsu, Chung-Chien ;
Chi, Tai-Shih ;
Chien, Jen-Tzung .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :560-564
[9]  
Le Roux J, 2015, INT CONF ACOUST SPEE, P66, DOI 10.1109/ICASSP.2015.7177933
[10]  
Lee DD, 2001, ADV NEUR IN, V13, P556