Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis

被引:2
|
作者
Al-Radhi, Mohammed Salah [1 ]
Csapo, Tamas Gabor [1 ,2 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, H-1117 Budapest, Hungary
[2] MTA ELTE Lendulet Lingual Articulat Res Grp, H-1088 Budapest, Hungary
关键词
noise masking; continuous vocoder; speech synthesis; phase distortion; kernel density functions; SYNTHESIS SYSTEM; MODEL;
D O I
10.1587/transinf.2019EDP7167
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a method called "continuous noise masking (cNM)" that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
引用
收藏
页码:1099 / 1107
页数:9
相关论文
共 50 条
  • [41] Research on text analysis for Tibetan statistical parametric speech synthesis
    Gan, Zhenye
    Kong, Xinjie
    Zhang, Shuai
    2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016), 2016, : 877 - 882
  • [42] COMPLEX CEPSTRUM AS PHASE INFORMATION IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Maia, Ranniery
    Akamine, Masami
    Gales, M. J. F.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4581 - 4584
  • [43] A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis
    Chen, Ling-Hui
    Raitio, Tuomo
    Valentini-Botinhao, Cassia
    Ling, Zhen-Hua
    Yamagishi, Junichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 2003 - 2014
  • [44] Statistical parametric speech synthesis using a hidden trajectory model
    Cai, Ming-Qi
    Ling, Zhen-Hua
    Dai, Li-Rong
    SPEECH COMMUNICATION, 2015, 72 : 149 - 159
  • [45] FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow for Speech Synthesis
    Manh Luang
    Viet Anh Tran
    INTERSPEECH 2022, 2022, : 1576 - 1580
  • [46] Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
    Fu, Ruibo
    Tao, Jianhua
    Zheng, Yibin
    Wen, Zhengqi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 907 - 911
  • [47] DEEP BELIEF NETWORK-BASED POST-FILTERING FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Hu, Ya-Jun
    Ling, Zhen-Hua
    Dai, Li-Rong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5510 - 5514
  • [48] A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis
    Ronanki, Srikanth
    Watts, Oliver
    King, Simon
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1133 - 1137
  • [49] On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis
    Maia, Ranniery
    Akamine, Masami
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (05) : 1209 - 1232
  • [50] Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis
    Espic, Felipe
    Valentini-Botinhao, Cassia
    King, Simon
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1383 - 1387