Unsupervised Audio Source Separation using Generative Priors

被引:10
|
作者
Narayanaswamy, Vivek [1 ]
Thiagarajan, Jayaraman J. [2 ]
Anirudh, Rushil [2 ]
Spanias, Andreas [1 ]
机构
[1] Arizona State Univ, SenSIP Ctr, Sch ECEE, Tempe, AZ 85281 USA
[2] Lawrence Livermore Natl Lab, 7000 East Ave, Livermore, CA 94550 USA
来源
INTERSPEECH 2020 | 2020年
关键词
audio source separation; unsupervised learning; generative priors; projected gradient descent;
D O I
10.21437/Interspeech.2020-3115
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
State-of-the-art under-determined audio source separation systems rely on supervised end to end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are severely challenged in terms of requiring access to expensive source level labeled data and being specific to a given set of sources and the mixing process, which demands complete re-training when those assumptions change. This strongly emphasizes the need for unsupervised methods that can leverage the recent advances in data-driven modeling, and compensate for the lack of labeled data through meaningful priors. To this end, we propose a novel approach for audio source separation based on generative priors trained on individual sources. Through the use of projected gradient descent optimization, our approach simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources. Though the generative priors can be defined in the time domain directly, e.g. WaveGAN, we find that using spectral domain loss functions for our optimization leads to good-quality source estimates. Our empirical studies on standard spoken digit and instrument datasets clearly demonstrate the effectiveness of our approach over classical as well as state-of-the-art unsupervised baselines.
引用
收藏
页码:2657 / 2661
页数:5
相关论文
共 50 条
  • [21] GENERATIVE ADVERSARIAL SOURCE SEPARATION
    Subakan, Y. Cem
    Smaragdis, Paris
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 26 - 30
  • [22] SOURCE CODING OF AUDIO SIGNALS WITH A GENERATIVE MODEL
    Fejgin, Roy
    Klejsa, Janusz
    Villemoes, Lars
    Zhou, Cong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 341 - 345
  • [23] Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
    Chatterjee, Moitreya
    Ahuja, Narendra
    Cherian, Anoop
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [24] Unsupervised hyperspectral image classification using blind source separation
    Du, Q
    Chakrarvarty, S
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING SIGNAL, PROCESSING EDUCATION, 2003, : 437 - 440
  • [25] Underdetermined Blind Audio Source Separation Using Modal Decomposition
    Aissa-El-Bey, Abdeldjalil
    Abed-Meraim, Karim
    Grenier, Yves
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2007, 2007 (1)
  • [26] Blind Audio Source Separation Using Wiener Filtering Approach
    Sharma, Pardeep
    Mehra, Rajesh
    Dubey, Naveen
    2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,
  • [27] Underdetermined audio source separation using fast parametric decomposition
    Aissa-El-Bey, Abdeldjalil
    Abed-Meraim, Karim
    Grenier, Yves
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 588 - 591
  • [28] Underdetermined Blind Audio Source Separation Using Modal Decomposition
    Abdeldjalil Aïssa-El-Bey
    Karim Abed-Meraim
    Yves Grenier
    EURASIP Journal on Audio, Speech, and Music Processing, 2007
  • [29] Anti-Forensics of Audio Source Identification Using Generative Adversarial Network
    Li, Xiaowen
    Yan, Diqun
    Dong, Li
    Wang, Rangding
    IEEE ACCESS, 2019, 7 : 184332 - 184339
  • [30] ONLINE IVA WITH ADAPTIVE LEARNING FOR SPEECH SEPARATION USING VARIOUS SOURCE PRIORS
    Erateb, Suleiman
    Naqvi, Mohsen
    Chambers, Jonathon
    2017 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE (SSPD), 2017, : 74 - 78