Multichannel Blind Sound Source Separation Using Spatial Covariance Model With Level and Time Differences and Nonnegative Matrix Factorization

被引：21

作者：

Carabias-Orti, Julio Jose ^{[1
]}

Nikunen, Joonas ^{[1
]}

Virtanen, Tuomas ^{[1
]}

Vera-Candeas, Pedro ^{[2
]}

机构：

[1] Tampere Univ Technol, Dept Signal Proc, Tampere 33720, Finland

[2] Univ Jaen, Dept Telecommun Engn, Jaen 23071, Spain

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 09期

基金：

芬兰科学院;

关键词：

Multichannel source separation; spatial covariance model; interaural time difference; interaural level difference; non-negative matrix factorization; direction of arrival estimation; AUDIO SOURCE SEPARATION; EVALUATION CAMPAIGN; SIGNAL SEPARATION; PERSPECTIVE;

D O I：

10.1109/TASLP.2018.2830105

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents an algorithm for multichannel sound source separation using explicit modeling of level and time differences in source spatial covariance matrices (SCM). We propose a novel SCM model in which the spatial properties are modeled by the weighted sum of direction of arrival (DOA) kernels. DOA kernels are obtained as the combination of phase and level difference covariance matrices representing both time and level differences between microphones for a grid of predefined source directions. The proposed SCM model is combined with the NMF model for the magnitude spectrograms. Opposite to other SCM models in the literature, in this work, source localization is implicitly defined in the model and estimated during the signal factorization. Therefore, no localization preprocessing is required. Parameters are estimated using complex-valued nonnegative matrix factorization with both Euclidean distance and Itakura-Saito divergence. Separation performance of the proposed system is evaluated using the two-channel SiSEC development dataset and four channels signals recorded in a regular room with moderate reverberation. Finally, a comparison to other state-of-the-art methods is performed, showing better achieved separation performance in terms of SIR and perceptual measures.

引用

页码：1512 / 1527

页数：16

共 39 条

[1] Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570
[2] Constrained non-negative matrix factorization for score-informed piano music restoration
Canadas-Quesada, F. J.
Vera-Candeas, P.
Martinez-Munoz, D.
Ruiz-Reyes, N.
Carabias-Orti, J. J.
Cabanas-Molero, R.
[J]. DIGITAL SIGNAL PROCESSING, 2016, 50 : 240 - 257
[3] Blind signal separation: Statistical principles
Cardoso, JF
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (10) : 2009 - 2025
[4] DiBiase J. H., 2001, MICROPHONE ARRAYS SI
[5] Subjective and Objective Quality Assessment of Audio Source Separation
Emiya, Valentin
Vincent, Emmanuel
Harlander, Niklas
Hohmann, Volker
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07): : 2046 - 2057
[6] Ewert S, 2011, INT CONF ACOUST SPEE, P385
[7] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
Fevotte, Cedric
Bertin, Nancy
Durrieu, Jean-Louis
[J]. NEURAL COMPUTATION, 2009, 21 (03) : 793 - 830
[8] FitzGerald D., 2005, IEE Irish Signals and Systems Conference 2005, P8, DOI 10.1049/cp:20050279
[9] FOvotte COdric., 2010, International Symposium on Computer Music Modeling and Retrieval, P102
[10] A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
Gannot, Sharon
Vincent, Emmanuel
Markovich-Golan, Shmulik
Ozerov, Alexey
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 692 - 730

← 1 2 3 4 →