Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch

被引:4
作者
Bae, Hanbin [1 ]
Joo, Young-Sun [1 ]
机构
[1] NCSOFT Corp, Speech AI Lab, Seongnam Si, South Korea
来源
INTERSPEECH 2022 | 2022年
关键词
timbre-preserving pitch-shifting algorithm; pitch augmentation; text-to-speech; FastPitch; VocGAN; SPEECH SYNTHESIS SYSTEM;
D O I
10.21437/Interspeech.2022-55
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recently developed pitch-controllable text-to-speech (TTS) model, i.e. FastPitch, was conditioned for the pitch contours. However, the quality of the synthesized speech degraded considerably for pitch values that deviated significantly from the average pitch; i.e. the ability to control pitch was limited. To address this issue, we propose two algorithms to improve the robustness of FastPitch. First, we propose a novel timbre-preserving pitch-shifting algorithm for natural pitch augmentation. Pitch-shifted speech samples sound more natural when using the proposed algorithm because the speaker's vocal timbre is maintained. Moreover, we propose a training algorithm that defines FastPitch using pitch-augmented speech datasets with different pitch ranges for the same sentence. The experimental results demonstrate that the proposed algorithms improve the pitch controllability of FastPitch.
引用
收藏
页码:6 / 10
页数:5
相关论文
共 21 条
[1]   Speech Enhancement for Noise-Robust Speech Synthesis using Wasserstein GAN [J].
Adiga, Nagaraj ;
Pantazis, Yannis ;
Tsiaras, Vassilis ;
Stylianou, Yannis .
INTERSPEECH 2019, 2019, :1821-1825
[2]  
[Anonymous], SOX SOUND EXCHANGE
[3]  
Bae H., 2021, P INT C AC SPEECH SI, P6603
[4]   FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis [J].
Bak, Taejun ;
Bae, Jae-Sung ;
Bae, Hanbin ;
Kim, Young-Ik ;
Cho, Hoon-Young .
INTERSPEECH 2021, 2021, :116-120
[5]  
Hamon C., 1989, ICASSP-89: 1989 International Conference on Acoustics, Speech and Signal Processing (IEEE Cat. No.89CH2673-2), P238, DOI 10.1109/ICASSP.1989.266409
[6]  
Kingma D. P., 2015, INT C LEARN REPR
[7]  
KUBICHEK RF, 1993, IEEE PACIF, P125, DOI 10.1109/PACRIM.1993.407206
[8]   FASTPITCH: PARALLEL TEXT-TO-SPEECH WITH PITCH PREDICTION [J].
Lancucki, Adrian .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6588-6592
[9]  
Laroche J., 2002, APPL DIGITAL SIGNAL
[10]  
Morise M., OPEN SOURCE WORLD