FASTPITCH: PARALLEL TEXT-TO-SPEECH WITH PITCH PREDICTION

被引:140
作者
Lancucki, Adrian [1 ]
机构
[1] NVIDIA Corp, Santa Clara, CA 95051 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
text-to-speech; speech synthesis; fundamental frequency;
D O I
10.1109/ICASSP39728.2021.9413889
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency contours improves the overall quality of synthesized speech, making it comparable to state-of-the-art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformer architecture, with over 900x real-time factor for mel-spectrogram synthesis of a typical utterance.
引用
收藏
页码:6588 / 6592
页数:5
相关论文
共 27 条
[1]  
[Anonymous], 2018, ARXIV180804888
[2]   Large-scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings [J].
Baumann, Timo .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2262-2266
[3]  
Binkowski Mikolaj, 2019, INT C LEARN REPR
[4]  
Boersma P., 1993, PROC I PHONETIC SCI, V17, P97
[5]  
Cartwright M, 2016, INT CONF ACOUST SPEE, P619, DOI 10.1109/ICASSP.2016.7471749
[6]  
Fernandez R, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1606
[7]  
Glickman M.E., 2013, Example of the Glicko-2 system
[8]  
Ito Keith, 2017, The lj speech dataset
[9]  
Kastner K, 2019, INT CONF ACOUST SPEE, P5906, DOI [10.1109/ICASSP.2019.8682880, 10.1109/icassp.2019.8682880]
[10]  
Kim Jaehyeon, 2020, Advances in Neural Information Processing Systems, V33