Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis

被引:2
|
作者
Al-Radhi, Mohammed Salah [1 ]
Csapo, Tamas Gabor [1 ,2 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, H-1117 Budapest, Hungary
[2] MTA ELTE Lendulet Lingual Articulat Res Grp, H-1088 Budapest, Hungary
关键词
noise masking; continuous vocoder; speech synthesis; phase distortion; kernel density functions; SYNTHESIS SYSTEM; MODEL;
D O I
10.1587/transinf.2019EDP7167
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a method called "continuous noise masking (cNM)" that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
引用
收藏
页码:1099 / 1107
页数:9
相关论文
共 50 条
  • [21] Investigations on speaker adaptation using a continuous vocoder within recurrent neural network based text-to-speech synthesis
    Ali Raheem Mandeel
    Mohammed Salah Al-Radhi
    Tamás Gábor Csapó
    Multimedia Tools and Applications, 2023, 82 : 15635 - 15649
  • [22] Investigations on speaker adaptation using a continuous vocoder within recurrent neural network based text-to-speech synthesis
    Mandeel, Ali Raheem
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15635 - 15649
  • [23] Complex cepstrum for statistical parametric speech synthesis
    Maia, Ranniery
    Akamine, Masami
    Gales, Mark J. F.
    SPEECH COMMUNICATION, 2013, 55 (05) : 606 - 618
  • [24] Autoregressive Models for Statistical Parametric Speech Synthesis
    Shannon, Matt
    Zen, Heiga
    Byrne, William
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 587 - 597
  • [25] Time-domain deterministic plus noise model based hybrid source modeling for statistical parametric speech synthesis
    Narendra, N. P.
    Rao, K. Sreenivasa
    SPEECH COMMUNICATION, 2016, 77 : 65 - 83
  • [26] Excitation modelling using epoch features for statistical parametric speech synthesis
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [27] Vocoder-Based Speech Synthesis from Silent Videos
    Michelsanti, Daniel
    Slizovskaia, Olga
    Haro, Gloria
    Gomez, Emilia
    Tan, Zheng-Hua
    Jensen, Jesper
    INTERSPEECH 2020, 2020, : 3530 - 3534
  • [28] Measuring the Effect of Reverberation on Statistical Parametric Speech Synthesis
    Coto-Jimenez, Marvin
    HIGH PERFORMANCE COMPUTING, CARLA 2019, 2020, 1087 : 369 - 382
  • [29] THE EFFECT OF NEURAL NETWORKS IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4455 - 4459
  • [30] A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis
    Airaksinen, Manu
    Juvela, Lauri
    Bollepalli, Bajibabu
    Yamagishi, Junichi
    Alku, Paavo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1658 - 1670