Unsupervised Audio Source Separation using Generative Priors

被引:10
|
作者
Narayanaswamy, Vivek [1 ]
Thiagarajan, Jayaraman J. [2 ]
Anirudh, Rushil [2 ]
Spanias, Andreas [1 ]
机构
[1] Arizona State Univ, SenSIP Ctr, Sch ECEE, Tempe, AZ 85281 USA
[2] Lawrence Livermore Natl Lab, 7000 East Ave, Livermore, CA 94550 USA
来源
INTERSPEECH 2020 | 2020年
关键词
audio source separation; unsupervised learning; generative priors; projected gradient descent;
D O I
10.21437/Interspeech.2020-3115
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
State-of-the-art under-determined audio source separation systems rely on supervised end to end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are severely challenged in terms of requiring access to expensive source level labeled data and being specific to a given set of sources and the mixing process, which demands complete re-training when those assumptions change. This strongly emphasizes the need for unsupervised methods that can leverage the recent advances in data-driven modeling, and compensate for the lack of labeled data through meaningful priors. To this end, we propose a novel approach for audio source separation based on generative priors trained on individual sources. Through the use of projected gradient descent optimization, our approach simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources. Though the generative priors can be defined in the time domain directly, e.g. WaveGAN, we find that using spectral domain loss functions for our optimization leads to good-quality source estimates. Our empirical studies on standard spoken digit and instrument datasets clearly demonstrate the effectiveness of our approach over classical as well as state-of-the-art unsupervised baselines.
引用
收藏
页码:2657 / 2661
页数:5
相关论文
共 50 条
  • [31] Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics
    Zeng, Donghuo
    Yu, Yi
    Oyama, Keizo
    2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 162 - 165
  • [32] MOTION INFORMED AUDIO SOURCE SEPARATION
    Parekh, Sanjeel
    Essid, Slim
    Ozerov, Alexey
    Duong, Ngoc Q. K.
    Perez, Patrick
    Richard, Gael
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6 - 10
  • [33] Music Source Separation With Generative Flow
    Zhu, Ge
    Darefsky, Jordan
    Jiang, Fei
    Selitskiy, Anton
    Duan, Zhiyao
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2288 - 2292
  • [34] Audio source separation of convolutive mixtures
    Mitianoudis, N
    Davies, ME
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 489 - 497
  • [35] Joint Audio Inpainting and Source Separation
    Bilen, Cagdas
    Ozerov, Alexey
    Perez, Patrick
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015, 2015, 9237 : 251 - 258
  • [36] Single channel audio source separation
    Gao, Bin
    Woo, W.L.
    Dlay, S.S.
    WSEAS Transactions on Signal Processing, 2008, 4 (04): : 173 - 182
  • [37] AN OVERVIEW OF INFORMED AUDIO SOURCE SEPARATION
    Liutkus, Antoine
    Durrieu, Jean-Louis
    Daudet, Laurent
    Richard, Gael
    2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
  • [38] ON-THE-FLY AUDIO SOURCE SEPARATION
    El Badawy, Dalia
    Duong, Ngoc Q. K.
    Ozerov, Alexey
    2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
  • [39] Audio source separation: solutions and problems
    Mitianoudis, N
    Davies, ME
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2004, 18 (03) : 299 - 314
  • [40] DOPING AUDIO SIGNALS FOR SOURCE SEPARATION
    Mahe, Gael
    Nadalin, Everton Z.
    Romano, Joao-Marcos T.
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2402 - 2406