MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

被引:0
|
作者
Drossos, Konstantinos [1 ]
Mimilakis, Stylianos Ioannis [2 ]
Serdyuk, Dmitriy [3 ]
Schuller, Gerald [2 ]
Virtanen, Tuomas [1 ]
Bengio, Yoshua [3 ]
机构
[1] Tampere Univ Technol, Tampere, Finland
[2] Tech Univ Ilmenau, Fraunhofer IDMT, Ilmenau, Germany
[3] Univ Montreal, MILA, Montreal, PQ, Canada
基金
欧盟地平线“2020”; 欧洲研究理事会; 加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.
引用
收藏
页数:8
相关论文
共 22 条
  • [1] MONAURAL SOUND SOURCE SEPARATION USING COVARIANCE PROFILE OF PARTIALS
    Goel, Priyank
    Ramakrishnan, K. R.
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2452 - 2456
  • [2] NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION
    Becker, Julian M.
    Sohn, Christian
    Rohlfing, Christian
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 316 - 320
  • [3] Monaural Source Separation Based on Adaptive Discriminative Criterion in Neural Networks
    Sun, Yang
    Zhu, Lei
    Chambers, Jonathon A.
    Naqvi, Syed Mohsen
    2017 22ND INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2017,
  • [4] Non-negative Tensor Factorisation of Modulation Spectrograms for Monaural Sound Source Separation
    Barker, Tom
    Virtanen, Tuomas
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 827 - 831
  • [5] CLUSTERING NMF BASIS FUNCTIONS USING SHIFTED NMF FOR MONAURAL SOUND SOURCE SEPARATION
    Jaiswal, Rajesh
    FitzGerald, Derry
    Barry, Dan
    Coyle, Eugene
    Rickard, Scott
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 245 - 248
  • [6] Agglomerative Hierarchical Clustering of Basis Vector for Monaural Sound Source Separation Based on NMF
    Murai, Kentaro
    Takeuchi, Taiho
    Tatekura, Yosuke
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1653 - 1657
  • [7] Monaural sound source separation by nonnegative matrix factorization with tempora continuity and sparseness criteria
    Virtanen, Tuomas
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1066 - 1074
  • [8] Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
    Huang, Po-Sen
    Kim, Minje
    Hasegawa-Johnson, Mark
    Smaragdis, Paris
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2136 - 2147
  • [9] Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
    Grais, Emad M.
    Wierstorf, Hagen
    Ward, Dominic
    Plumbley, Mark D.
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2018), 2018, 10891 : 340 - 350
  • [10] SOUND SOURCE SEPARATION IN MONAURAL MUSIC SIGNALS USING EXCITATION-FILTER MODEL AND EM ALGORITHM
    Klapuri, Anssi
    Virtanen, Tuomas
    Heittola, Toni
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5510 - 5513