FASTPITCH: PARALLEL TEXT-TO-SPEECH WITH PITCH PREDICTION

被引：140

作者：

Lancucki, Adrian ^{[1
]}

机构：

[1] NVIDIA Corp, Santa Clara, CA 95051 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

text-to-speech; speech synthesis; fundamental frequency;

D O I：

10.1109/ICASSP39728.2021.9413889

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency contours improves the overall quality of synthesized speech, making it comparable to state-of-the-art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformer architecture, with over 900x real-time factor for mel-spectrogram synthesis of a typical utterance.

引用

页码：6588 / 6592

页数：5

共 27 条

[1]

[Anonymous], 2018, ARXIV180804888

[2] Large-scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings [J].

Baumann, Timo .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2262-2266

[3]

Binkowski Mikolaj, 2019, INT C LEARN REPR

[4]

Boersma P., 1993, PROC I PHONETIC SCI, V17, P97

[5]

Cartwright M, 2016, INT CONF ACOUST SPEE, P619, DOI 10.1109/ICASSP.2016.7471749

[6]

Fernandez R, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1606

[7]

Glickman M.E., 2013, Example of the Glicko-2 system

[8]

Ito Keith, 2017, The lj speech dataset

[9]

Kastner K, 2019, INT CONF ACOUST SPEE, P5906, DOI [10.1109/ICASSP.2019.8682880, 10.1109/icassp.2019.8682880]

[10]

Kim Jaehyeon, 2020, Advances in Neural Information Processing Systems, V33

← 1 2 3 →