Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

被引：0

作者：

Vainio, Martti ^{[1
]}

机构：

[1] Univ Helsinki, Inst Behav Sci, Helsinki, Finland

来源：

STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014 | 2014年 / 8791卷

基金：

芬兰科学院;

关键词：

Phonetics; Machine learning; Speech synthesis; Prosody; PARAMETER GENERATION; PERCEPTION; LANGUAGE; CONTEXT; SYSTEM; VOWEL; TEXT;

D O I：

10.1007/978-3-319-11397-5_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-speech synthesis is a task that solves many realworld problems such as providing speaking and reading ability to people who lack those capabilities. It is thus viewed mainly as an engineering problem rather than a purely scientific one. Therefore many of the solutions in speech synthesis are purely practical. However, from the point of view of phonetics, the process of producing speech from text artificially is also a scientific one. Here I argue - using an example from speech prosody, namely speech melody - that phonetics is the key discipline in helping to solve what is arguably one of the most interesting problems in machine learning.

引用

页码：37 / 54

页数：18

共 93 条

[1] A method for generating natural-sounding speech stimuli for cognitive brain research [J].