HARMONIC-PERCUSSIVE SOURCE SEPARATION WITH DEEP NEURAL NETWORKS AND PHASE RECOVERY

被引:0
作者
Drossos, Konstantinos [1 ]
Magron, Paul [1 ]
Mimilakis, Stylianos Ioannis [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Lab Signal Proc, Tampere, Finland
[2] Fraunhofer IDMT, Ilmenau, Germany
来源
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) | 2018年
基金
芬兰科学院; 欧洲研究理事会; 欧盟地平线“2020”;
关键词
harmonic/percussive source separation; deep neural networks; MaD TwinNet; phase recovery; sinusoidal model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Harmonic/percussive source separation (HPSS) consists in separating the pitched instruments from the percussive parts in a music mixture. In this paper, we propose to apply the recently introduced Masker-Denoiser with twin networks (MaD TwinNet) system to this task. MaD TwinNet is a deep learning architecture that has reached state-of-the-art results in monaural singing voice separation. Herein, we propose to apply it to HPSS by using it to estimate the magnitude spectrogram of the percussive source. Then, we retrieve the complex-valued short-time Fourier transform of the sources by means of a phase recovery algorithm, which minimizes the reconstruction error and enforces the phase of the harmonic part to follow a sinusoidal phase model. Experiments conducted on realistic music mixtures show that this novel separation system outperforms the previous state-of-the art kernel additive model approach.
引用
收藏
页码:421 / 425
页数:5
相关论文
共 50 条
[21]   Complex imaging of phase domains by deep neural networks [J].
Wu, Longlong ;
Juhas, Pavol ;
Yoo, Shinjae ;
Robinson, Ian .
IUCRJ, 2021, 8 :12-21
[22]   Model-Based STFT Phase Recovery for Audio Source Separation [J].
Magron, Paul ;
Badeau, Roland ;
David, Bertrand .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (06) :1091-1101
[23]   Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks [J].
Yang Yu ;
Wenwu Wang ;
Peng Han .
EURASIP Journal on Audio, Speech, and Music Processing, 2016
[24]   Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks [J].
Yu, Yang ;
Wang, Wenwu ;
Han, Peng .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
[25]   LOWLATENCY SOUND SOURCE SEPARATION USING CONVOLUTIONAL RECURRENT NEURAL NETWORKS [J].
Naithani, Gaurav ;
Barker, Tom ;
Parascandolo, Giambattista ;
Bramslow, Lars ;
Pontoppidan, Niels Henrik ;
Virtanen, Tuomas .
2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, :71-75
[26]   Active Data Fusion in Deep Neural Networks via Separation Index [J].
Jamshidi, Movahed ;
Kalhor, Ahmad ;
Vahabie, Abdol-Hossein .
2024 32ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, ICEE 2024, 2024, :754-760
[27]   Automated recovery of damaged audio files using deep neural networks [J].
Heo, Hee-Soo ;
So, Byung-Min ;
Yang, IL-Ho ;
Yoon, Sung-Hyun ;
Yu, Ha-Jin .
DIGITAL INVESTIGATION, 2019, 30 :117-126
[28]   Ensemble System of Deep Neural Networks for Single-Channel Audio Separation [J].
Al-Kaltakchi, Musab T. S. ;
Mohammad, Ahmad Saeed ;
Woo, Wai Lok .
INFORMATION, 2023, 14 (07)
[29]   Time-Domain Audio Source Separation With Neural Networks Based on Multiresolution Analysis [J].
Nakamura, Tomohiko ;
Kozuka, Shihori ;
Saruwatari, Hiroshi .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :1687-1701
[30]   Single channel source separation using time–frequency non-negative matrix factorization and sigmoid base normalization deep neural networks [J].
Yannam Vasantha Koteswararao ;
C. B. Rama Rao .
Multidimensional Systems and Signal Processing, 2022, 33 :1023-1043