LOCAL GAUSSIAN MODEL WITH SOURCE-SET CONSTRAINTS IN AUDIO SOURCE SEPARATION

被引:0
作者
Ikeshita, Rintaro [1 ]
Togami, Masahito [1 ]
Kawaguchi, Yohei [1 ]
Fujita, Yusuke [1 ]
Nagamatsu, Kenji [1 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan
来源
2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING | 2017年
关键词
Blind audio source separation; local Gaussian model; time-frequency mask; diffusion noise; permutation alignment; NONNEGATIVE MATRIX FACTORIZATION; MIXTURES;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
To improve the performance of blind audio source separation of convolutive mixtures, the local Gaussian model (LGM) having full rank covariance matrices proposed by Duong et al. is extended. The previous model basically assumes that all sources contribute to each time-frequency slot, which may fail to capture the characteristic of signals with many intermittent silent periods. A constraint on source sets that contribute to each time-frequency slot is therefore explicitly introduced. This approach can be regarded as a relaxation of the sparsity constraint in the conventional time-frequency mask. The proposed model is jointly optimized among the original local Gaussian model parameters, the relaxed version of the time-frequency mask, and a permutation alignment, leading to a robust permutation-free algorithm. We also present a novel multi-channel Wiener filter weighted by a relaxed version of the time-frequency mask. Experimental results over noisy speech signals show that the proposed model is effective compared with the original local Gaussian model and is comparable to its extension, the multi-channel nonnegative matrix factorization.
引用
收藏
页数:6
相关论文
共 22 条
[1]  
Araki S, 2011, INT CONF ACOUST SPEE, P225
[2]  
Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570
[3]  
Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
[4]  
Duong Ngoc Q. K., 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), P129, DOI 10.1109/ASPAA.2009.5346503
[5]   UNDER-DETERMINED CONVOLUTIVE BLIND SOURCE SEPARATION USING SPATIAL COVARIANCE MODELS [J].
Duong, Ngoc Q. K. ;
Vincent, Emmanuel ;
Gribonval, Remi .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :9-12
[6]   Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model [J].
Duong, Ngoc Q. K. ;
Vincent, Emmanuel ;
Gribonval, Remi .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1830-1840
[7]   Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models [J].
Févotte, C ;
Cardoso, JF .
2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, :78-81
[8]   Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].
Fevotte, Cedric ;
Bertin, Nancy ;
Durrieu, Jean-Louis .
NEURAL COMPUTATION, 2009, 21 (03) :793-830
[9]  
Higuchi T, 2016, INT CONF ACOUST SPEE, P5210, DOI 10.1109/ICASSP.2016.7472671
[10]  
Ito N, 2013, INT CONF ACOUST SPEE, P3238, DOI 10.1109/ICASSP.2013.6638256