CLASS-CONDITIONAL EMBEDDINGS FOR MUSIC SOURCE SEPARATION

被引：0

作者：

Seetharaman, Prem ^{[1
,2
]}

Wichern, Gordon ^{[1
]}

Venkataramani, Shrikant ^{[1
,3
]}

Le Roux, Jonathan ^{[1
]}

机构：

[1] MERL, Cambridge, MA 02139 USA

[2] Northwestern Univ, Evanston, IL 60208 USA

[3] Univ Illinois, Champaign, IL USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

source separation; deep clustering; music; classification; neural networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Isolating individual instruments in a musical mixture has a myriad of potential applications, and seems imminently achievable given the levels of performance reached by recent deep learning methods. While most musical source separation techniques learn an independent model for each instrument, we propose using a common embedding space for the time-frequency bins of all instruments in a mixture inspired by deep clustering and deep attractor networks. Additionally, an auxiliary network is used to generate parameters of a Gaussian mixture model (GMM) where the posterior distribution over GMM components in the embedding space can be used to create a mask that separates individual sources from a mixture. In addition to outperforming a mask-inference baseline on the MUSDB-18 dataset, our embedding space is easily interpretable and can be used for query-based separation.

引用

页码：301 / 305

页数：5

共 23 条

[1]

[Anonymous], P INT 2018

[2]

[Anonymous], P ISCA INT

[3]

Delcroix M, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5554, DOI 10.1109/ICASSP.2018.8462661

[4]

Drude L, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P11, DOI 10.1109/ICASSP.2018.8461778

[5] On-the-Fly Audio Source Separation-A Novel User-Friendly Framework [J].

El Badawy, Dalia ;

Duong, Ngoc Q. K. ;

Ozerov, Alexey .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) :261-272

[6]

Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061

[7]

Hershey JR, 2016, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2016.7471631

[8]

Huang P. S., 2014, ISMIR

[9] Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks [J].

Kolbaek, Morten ;

Yu, Dong ;

Tan, Zheng-Hua ;

Jensen, Jesper .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) :1901-1913

[10]

Luo Y, 2017, INT CONF ACOUST SPEE, P61, DOI 10.1109/ICASSP.2017.7952118

← 1 2 3 →