Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis

被引:2
|
作者
Al-Radhi, Mohammed Salah [1 ]
Csapo, Tamas Gabor [1 ,2 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, H-1117 Budapest, Hungary
[2] MTA ELTE Lendulet Lingual Articulat Res Grp, H-1088 Budapest, Hungary
关键词
noise masking; continuous vocoder; speech synthesis; phase distortion; kernel density functions; SYNTHESIS SYSTEM; MODEL;
D O I
10.1587/transinf.2019EDP7167
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a method called "continuous noise masking (cNM)" that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
引用
收藏
页码:1099 / 1107
页数:9
相关论文
共 50 条
  • [31] Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition
    Fujimoto, Takato
    Yoshimura, Takenori
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 644 - 648
  • [32] VOCBENCH: A NEURAL VOCODER BENCHMARK FOR SPEECH SYNTHESIS
    AlBadawy, Ehab A.
    Gibiansky, Andrew
    He, Qing
    Wu, Jilong
    Chang, Ming-Ching
    Lyu, Siwei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 881 - 885
  • [33] Voice quality control using perceptual expressions for statistical parametric speech synthesis based on cluster adaptive training
    Ohtani, Yamato
    Mori, Koichiro
    Morita, Masahiro
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2258 - 2262
  • [34] Czech Speech Synthesis with Generative Neural Vocoder
    Vit, Jakub
    Hanzlicek, Zdenek
    Matousek, Jindrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 307 - 315
  • [35] NEURAL SOURCE-FILTER-BASED WAVEFORM MODEL FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5916 - 5920
  • [36] Statistical Parametric Speech Synthesis Using Generalized Distillation Framework
    Liu, Zheng-Chen
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (05) : 695 - 699
  • [37] Acoustic Features Modelling for Statistical Parametric Speech Synthesis: A Review
    Adiga, Nagaraj
    Prasanna, S. R. M.
    IETE TECHNICAL REVIEW, 2019, 36 (02) : 130 - 149
  • [38] Statistical parametric speech synthesis for Arabic language using ANN
    Ilyes, Rebai
    BenAyed, Yassine
    2014 1ST INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP 2014), 2014, : 452 - 457
  • [39] Statistical Parametric Speech Synthesis for Online Dictionaries - Problems and Solutions
    Piits, Liisi
    Kudritski, Elgar
    Kiissel, Indrek
    Hein, Indrek
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 27 - 32
  • [40] Duration modelling and evaluation for Arabic statistical parametric speech synthesis
    Zangar, Imene
    Mnasri, Zied
    Colotte, Vincent
    Jouvet, Denis
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8331 - 8353