Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation

被引:7
作者
Magron, Paul [1 ]
Drossos, Konstantinos [1 ]
Mimilakis, Stylianos Ioannis [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Signal Proc Lab, Tampere, Finland
[2] Fraunhofer IDMT, Ilmenau, Germany
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
基金
芬兰科学院; 欧洲研究理事会; 欧盟地平线“2020”;
关键词
Monaural singing voice separation; phase recovery; deep neural networks; MaD TwinNet; Wiener filtering; NONNEGATIVE MATRIX FACTORIZATION; AUDIO SOURCE SEPARATION;
D O I
10.21437/Interspeech.2018-1845
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art methods for monaural singing voice separation consist in estimating the magnitude spectrum of the voice in the short-time Fourier transform (STFT) domain by means of deep neural networks (DNNs). The resulting magnitude estimate is then combined with the mixture's phase to retrieve the complex-valued STFT of the voice, which is further synthesized into a time-domain signal. However, when the sources overlap in time and frequency, the STFT phase of the voice differs from the mixture's phase, which results in interference and artifacts in the estimated signals. In this paper, we investigate on recent phase recovery algorithms that tackle this issue and can further enhance the separation quality. These algorithms exploit phase constraints that originate from a sinusoidal model or from consistency, a property that is a direct consequence of the STFT redundancy. Experiments conducted on real music songs show that those algorithms are efficient for reducing interference in the estimated voice compared to the baseline approach.
引用
收藏
页码:332 / 336
页数:5
相关论文
共 30 条
  • [1] Abe M., 2004, AUD ENG SOC CONV MAY
  • [2] [Anonymous], P IEEE INT C AC SPEE
  • [3] Bronson J, 2014, INT CONF ACOUST SPEE
  • [4] Comon P, 2010, HANDBOOK OF BLIND SOURCE SEPARATION: INDEPENDENT COMPONENT ANALYSIS AND APPLICATIONS, P1
  • [5] Drossos K, 2018, IEEE IJCNN
  • [6] Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
    Fevotte, Cedric
    Bertin, Nancy
    Durrieu, Jean-Louis
    [J]. NEURAL COMPUTATION, 2009, 21 (03) : 793 - 830
  • [7] Grais EM, 2014, INT CONF ACOUST SPEE
  • [8] Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks
    Grais, Emad M.
    Roma, Gerard
    Simpson, Andrew J. R.
    Plumbley, Mark D.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (09) : 1469 - 1479
  • [9] SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM
    GRIFFIN, DW
    LIM, JS
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02): : 236 - 243
  • [10] Huang P.-S., 2014, P IEEE INT C AC SPEE