MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

被引:0
|
作者
Drossos, Konstantinos [1 ]
Mimilakis, Stylianos Ioannis [2 ]
Serdyuk, Dmitriy [3 ]
Schuller, Gerald [2 ]
Virtanen, Tuomas [1 ]
Bengio, Yoshua [3 ]
机构
[1] Tampere Univ Technol, Tampere, Finland
[2] Tech Univ Ilmenau, Fraunhofer IDMT, Ilmenau, Germany
[3] Univ Montreal, MILA, Montreal, PQ, Canada
基金
欧盟地平线“2020”; 欧洲研究理事会; 加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.
引用
收藏
页数:8
相关论文
共 22 条
  • [21] Robotic sound-source localisation architecture using cross-correlation and recurrent neural networks
    Murray, John C.
    Erwin, Harry R.
    Wermter, Stefan
    NEURAL NETWORKS, 2009, 22 (02) : 173 - 189
  • [22] Inference-Adaptive Steering of Neural Networks for Real-Time Area-Based Sound Source Separation
    Strauss, Martin
    Mack, Wolfgang
    Valero, Maria Luis
    Koepueklue, Okan
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1041 - 1045