A Speech Synthesis Approach for High Quality Speech Separation and Generation

被引:6
作者
Liu, Qingju [1 ]
Jackson, Philip J. B. [1 ]
Wang, Wenwu [1 ,2 ]
机构
[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
[2] Qingdao Univ Sci & Technol, Qingdao 266061, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
Deep learning; speech separation; speech synthesis; WaveNet; hourglass; high quality; NETWORKS;
D O I
10.1109/LSP.2019.2951894
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a new method for source separation by synthesizing the source froma speech mixture corrupted by various environmental noise. Unlike traditional source separation methods which estimate the source from the mixture as a replica of the original source (e.g. by solving an inverse problem), our proposed method is a synthesis-based approachwhich aims to generate a new signal (i.e. "fake" source) that sounds similar to the original source. The proposed system has an encoder-decoder topology, where the encoder predicts intermediate-level features from the mixture, i.e. Mel-spectrum of the target source, using a hybrid recurrent and hourglass network, while the decoder is a state-of-the-artWaveNet speech synthesis network conditioned on the Mel-spectrum, which directly generates time-domain samples of the sources. Both objective and subjective evaluations were performed on the synthesized sources, and show great advantages of our proposed method for high-quality speech source separation and generation.
引用
收藏
页码:1872 / 1876
页数:5
相关论文
共 23 条
  • [1] [Anonymous], 2017, CORR
  • [2] [Anonymous], 2016, LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-319-46484-8_29
  • [3] [Anonymous], P INT C SPEECH PROS
  • [4] [Anonymous], P IEEE INT C LAT VAR
  • [5] [Anonymous], P C INT SPEECH COMM
  • [6] [Anonymous], P EUR SIGN PROC C AU
  • [7] SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM
    GRIFFIN, DW
    LIM, JS
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02): : 236 - 243
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Hershey JR, 2016, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2016.7471631
  • [10] Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
    Huang, Po-Sen
    Kim, Minje
    Hasegawa-Johnson, Mark
    Smaragdis, Paris
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2136 - 2147