Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis

被引：2

作者：

Al-Radhi, Mohammed Salah ^{[1
]}

Csapo, Tamas Gabor ^{[1
,2
]}

Nemeth, Geza ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, H-1117 Budapest, Hungary

[2] MTA ELTE Lendulet Lingual Articulat Res Grp, H-1088 Budapest, Hungary

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2020年 / E103D卷 / 05期

关键词：

noise masking; continuous vocoder; speech synthesis; phase distortion; kernel density functions; SYNTHESIS SYSTEM; MODEL;

D O I：

10.1587/transinf.2019EDP7167

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we propose a method called "continuous noise masking (cNM)" that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).

引用

页码：1099 / 1107

页数：9

共 50 条

[21] Investigations on speaker adaptation using a continuous vocoder within recurrent neural network based text-to-speech synthesis
Ali Raheem Mandeel
Mohammed Salah Al-Radhi
Tamás Gábor Csapó
Multimedia Tools and Applications, 2023, 82 : 15635 - 15649
[22] Investigations on speaker adaptation using a continuous vocoder within recurrent neural network based text-to-speech synthesis
Mandeel, Ali Raheem
Al-Radhi, Mohammed Salah
Csapo, Tamas Gabor
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15635 - 15649
[23] Complex cepstrum for statistical parametric speech synthesis
Maia, Ranniery
Akamine, Masami
Gales, Mark J. F.
SPEECH COMMUNICATION, 2013, 55 (05) : 606 - 618
[24] Autoregressive Models for Statistical Parametric Speech Synthesis
Shannon, Matt
Zen, Heiga
Byrne, William
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 587 - 597
[25] Time-domain deterministic plus noise model based hybrid source modeling for statistical parametric speech synthesis
Narendra, N. P.
Rao, K. Sreenivasa
SPEECH COMMUNICATION, 2016, 77 : 65 - 83
[26] Excitation modelling using epoch features for statistical parametric speech synthesis
Reddy, M. Kiran
Rao, K. Sreenivasa
COMPUTER SPEECH AND LANGUAGE, 2020, 60
[27] Vocoder-Based Speech Synthesis from Silent Videos
Michelsanti, Daniel
Slizovskaia, Olga
Haro, Gloria
Gomez, Emilia
Tan, Zheng-Hua
Jensen, Jesper
INTERSPEECH 2020, 2020, : 3530 - 3534
[28] Measuring the Effect of Reverberation on Statistical Parametric Speech Synthesis
Coto-Jimenez, Marvin
HIGH PERFORMANCE COMPUTING, CARLA 2019, 2020, 1087 : 369 - 382
[29] THE EFFECT OF NEURAL NETWORKS IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4455 - 4459
[30] A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis
Airaksinen, Manu
Juvela, Lauri
Bollepalli, Bajibabu
Yamagishi, Junichi
Alku, Paavo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1658 - 1670

← 1 2 3 4 5 →