Unsupervised Audio Source Separation using Generative Priors

被引：10

作者：

Narayanaswamy, Vivek ^{[1
]}

Thiagarajan, Jayaraman J. ^{[2
]}

Anirudh, Rushil ^{[2
]}

Spanias, Andreas ^{[1
]}

机构：

[1] Arizona State Univ, SenSIP Ctr, Sch ECEE, Tempe, AZ 85281 USA

[2] Lawrence Livermore Natl Lab, 7000 East Ave, Livermore, CA 94550 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

audio source separation; unsupervised learning; generative priors; projected gradient descent;

D O I：

10.21437/Interspeech.2020-3115

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

State-of-the-art under-determined audio source separation systems rely on supervised end to end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are severely challenged in terms of requiring access to expensive source level labeled data and being specific to a given set of sources and the mixing process, which demands complete re-training when those assumptions change. This strongly emphasizes the need for unsupervised methods that can leverage the recent advances in data-driven modeling, and compensate for the lack of labeled data through meaningful priors. To this end, we propose a novel approach for audio source separation based on generative priors trained on individual sources. Through the use of projected gradient descent optimization, our approach simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources. Though the generative priors can be defined in the time domain directly, e.g. WaveGAN, we find that using spectral domain loss functions for our optimization leads to good-quality source estimates. Our empirical studies on standard spoken digit and instrument datasets clearly demonstrate the effectiveness of our approach over classical as well as state-of-the-art unsupervised baselines.

引用

页码：2657 / 2661

页数：5

共 50 条

[31] Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics
Zeng, Donghuo
Yu, Yi
Oyama, Keizo
2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 162 - 165
[32] MOTION INFORMED AUDIO SOURCE SEPARATION
Parekh, Sanjeel
Essid, Slim
Ozerov, Alexey
Duong, Ngoc Q. K.
Perez, Patrick
Richard, Gael
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6 - 10
[33] Music Source Separation With Generative Flow
Zhu, Ge
Darefsky, Jordan
Jiang, Fei
Selitskiy, Anton
Duan, Zhiyao
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2288 - 2292
[34] Audio source separation of convolutive mixtures
Mitianoudis, N
Davies, ME
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 489 - 497
[35] Joint Audio Inpainting and Source Separation
Bilen, Cagdas
Ozerov, Alexey
Perez, Patrick
LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015, 2015, 9237 : 251 - 258
[36] Single channel audio source separation
Gao, Bin
Woo, W.L.
Dlay, S.S.
WSEAS Transactions on Signal Processing, 2008, 4 (04): : 173 - 182
[37] AN OVERVIEW OF INFORMED AUDIO SOURCE SEPARATION
Liutkus, Antoine
Durrieu, Jean-Louis
Daudet, Laurent
Richard, Gael
2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
[38] ON-THE-FLY AUDIO SOURCE SEPARATION
El Badawy, Dalia
Duong, Ngoc Q. K.
Ozerov, Alexey
2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
[39] Audio source separation: solutions and problems
Mitianoudis, N
Davies, ME
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2004, 18 (03) : 299 - 314
[40] DOPING AUDIO SIGNALS FOR SOURCE SEPARATION
Mahe, Gael
Nadalin, Everton Z.
Romano, Joao-Marcos T.
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2402 - 2406

← 1 2 3 4 5 →