HARMONIC-PERCUSSIVE SOURCE SEPARATION WITH DEEP NEURAL NETWORKS AND PHASE RECOVERY

被引:0
作者
Drossos, Konstantinos [1 ]
Magron, Paul [1 ]
Mimilakis, Stylianos Ioannis [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Lab Signal Proc, Tampere, Finland
[2] Fraunhofer IDMT, Ilmenau, Germany
来源
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) | 2018年
基金
芬兰科学院; 欧洲研究理事会; 欧盟地平线“2020”;
关键词
harmonic/percussive source separation; deep neural networks; MaD TwinNet; phase recovery; sinusoidal model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Harmonic/percussive source separation (HPSS) consists in separating the pitched instruments from the percussive parts in a music mixture. In this paper, we propose to apply the recently introduced Masker-Denoiser with twin networks (MaD TwinNet) system to this task. MaD TwinNet is a deep learning architecture that has reached state-of-the-art results in monaural singing voice separation. Herein, we propose to apply it to HPSS by using it to estimate the magnitude spectrogram of the percussive source. Then, we retrieve the complex-valued short-time Fourier transform of the sources by means of a phase recovery algorithm, which minimizes the reconstruction error and enforces the phase of the harmonic part to follow a sinusoidal phase model. Experiments conducted on realistic music mixtures show that this novel separation system outperforms the previous state-of-the art kernel additive model approach.
引用
收藏
页码:421 / 425
页数:5
相关论文
共 50 条
[31]   Holographic Image Reconstruction with Phase Recovery and Autofocusing Using Recurrent Neural Networks [J].
Huang, Luzhe ;
Liu, Tairan ;
Yang, Xilin ;
Luo, Yi ;
Rivenson, Yair ;
Ozcan, Aydogan .
ACS PHOTONICS, 2021, 8 (06) :1763-1774
[32]   Application of Open-Source Deep Neural Networks for Object Detection in Industrial Environments [J].
Poss, Christian ;
Ibragimov, Olimjon ;
Indreswaran, Anoshan ;
Gutsche, Nils ;
Irrenhauser, Thomas ;
Prueglmeier, Marco ;
Goehring, Daniel .
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, :231-236
[33]   SPECTRAL CONVERSION USING DEEP NEURAL NETWORKS TRAINED WITH MULTI-SOURCE SPEAKERS [J].
Liu, Li-Juan ;
Chen, Ling-Hui ;
Ling, Zhen-Hua ;
Dai, Li-Rong .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4849-4853
[34]   UNSUPERVISED ADAPTATION OF DEEP NEURAL NETWORKS FOR SOUND SOURCE LOCALIZATION USING ENTROPY MINIMIZATION [J].
Takeda, Ryu ;
Komatani, Kazunori .
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :2217-2221
[35]   Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks [J].
Pak, Junhyeong ;
Shin, Jong Won .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (08) :1335-1345
[36]   Single channel source separation using time-frequency non-negative matrix factorization and sigmoid base normalization deep neural networks [J].
Koteswararao, Yannam Vasantha ;
Rao, C. B. Rama .
MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2022, 33 (03) :1023-1043
[37]   A UNIFIED SPEAKER-DEPENDENT SPEECH SEPARATION AND ENHANCEMENT SYSTEM BASED ON DEEP NEURAL NETWORKS [J].
Gao, Tian ;
Du, Jun ;
Xu, Li ;
Liu, Cong ;
Dai, Li-Rong ;
Lee, Chin-Hui .
2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, :687-691
[38]   Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks [J].
Guo, Xinyu ;
Ou, Shifeng ;
Gao, Meng ;
Gao, Ying .
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, :445-450
[39]   Using a Neural Network Codec Approximation Loss to Improve Source Separation Performance in Limited Capacity Networks [J].
Ananthabhotla, Ishwarya ;
Ewert, Sebastian ;
Paradiso, Joseph A. .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[40]   SPEECH SEPARATION BASED ON SIGNAL-NOISE-DEPENDENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION [J].
Tu, Yan-Hui ;
Du, Jun ;
Dai, Li-Rong ;
Lee, Chin-Hui .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :61-65