Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0

被引：0

作者：

Corkey, Niamh ^{[1
]}

O'Mahony, Johannah ^{[1
]}

King, Simon ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

INTERSPEECH 2023 | 2023年

关键词：

text-to-speech; speech synthesis; intonation modelling; prosody control; prosody transfer;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a novel, user-friendly approach for controlling patterns of intonation (a fundamental aspect of prosody) within a neural TTS system. This involves concisely representing F0 contours with the coefficients of their Legendre polynomial series expansion, and implementing a model (based on FastPitch) which is conditioned on these sets of coefficients during training. At inference time the model will explicitly predict a coefficient set, or a user (eg. human-in-the-loop) can provide a target coefficient set which audibly alters the intonation of the output speech, based on just a few values. This is particularly effective for intonation transfer: where these coefficient targets are extracted from a ground truth recording, making the synthesised utterance more closely reflect the intonation of the real speaker.

引用

页码：2014 / 2015

页数：2