HARMONIC-PERCUSSIVE SOURCE SEPARATION WITH DEEP NEURAL NETWORKS AND PHASE RECOVERY

被引:0
作者
Drossos, Konstantinos [1 ]
Magron, Paul [1 ]
Mimilakis, Stylianos Ioannis [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Lab Signal Proc, Tampere, Finland
[2] Fraunhofer IDMT, Ilmenau, Germany
来源
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) | 2018年
基金
芬兰科学院; 欧洲研究理事会; 欧盟地平线“2020”;
关键词
harmonic/percussive source separation; deep neural networks; MaD TwinNet; phase recovery; sinusoidal model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Harmonic/percussive source separation (HPSS) consists in separating the pitched instruments from the percussive parts in a music mixture. In this paper, we propose to apply the recently introduced Masker-Denoiser with twin networks (MaD TwinNet) system to this task. MaD TwinNet is a deep learning architecture that has reached state-of-the-art results in monaural singing voice separation. Herein, we propose to apply it to HPSS by using it to estimate the magnitude spectrogram of the percussive source. Then, we retrieve the complex-valued short-time Fourier transform of the sources by means of a phase recovery algorithm, which minimizes the reconstruction error and enforces the phase of the harmonic part to follow a sinusoidal phase model. Experiments conducted on realistic music mixtures show that this novel separation system outperforms the previous state-of-the art kernel additive model approach.
引用
收藏
页码:421 / 425
页数:5
相关论文
共 50 条
  • [21] Complex imaging of phase domains by deep neural networks
    Wu, Longlong
    Juhas, Pavol
    Yoo, Shinjae
    Robinson, Ian
    IUCRJ, 2021, 8 : 12 - 21
  • [22] Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
    Yang Yu
    Wenwu Wang
    Peng Han
    EURASIP Journal on Audio, Speech, and Music Processing, 2016
  • [23] Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
    Yu, Yang
    Wang, Wenwu
    Han, Peng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
  • [24] Model-Based STFT Phase Recovery for Audio Source Separation
    Magron, Paul
    Badeau, Roland
    David, Bertrand
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (06) : 1091 - 1101
  • [25] LOWLATENCY SOUND SOURCE SEPARATION USING CONVOLUTIONAL RECURRENT NEURAL NETWORKS
    Naithani, Gaurav
    Barker, Tom
    Parascandolo, Giambattista
    Bramslow, Lars
    Pontoppidan, Niels Henrik
    Virtanen, Tuomas
    2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 71 - 75
  • [26] Automated recovery of damaged audio files using deep neural networks
    Heo, Hee-Soo
    So, Byung-Min
    Yang, IL-Ho
    Yoon, Sung-Hyun
    Yu, Ha-Jin
    DIGITAL INVESTIGATION, 2019, 30 : 117 - 126
  • [27] Ensemble System of Deep Neural Networks for Single-Channel Audio Separation
    Al-Kaltakchi, Musab T. S.
    Mohammad, Ahmad Saeed
    Woo, Wai Lok
    INFORMATION, 2023, 14 (07)
  • [28] Time-Domain Audio Source Separation With Neural Networks Based on Multiresolution Analysis
    Nakamura, Tomohiko
    Kozuka, Shihori
    Saruwatari, Hiroshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1687 - 1701
  • [29] Single channel source separation using time–frequency non-negative matrix factorization and sigmoid base normalization deep neural networks
    Yannam Vasantha Koteswararao
    C. B. Rama Rao
    Multidimensional Systems and Signal Processing, 2022, 33 : 1023 - 1043
  • [30] Holographic Image Reconstruction with Phase Recovery and Autofocusing Using Recurrent Neural Networks
    Huang, Luzhe
    Liu, Tairan
    Yang, Xilin
    Luo, Yi
    Rivenson, Yair
    Ozcan, Aydogan
    ACS PHOTONICS, 2021, 8 (06) : 1763 - 1774