Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

被引:0
作者
Vainio, Martti [1 ]
机构
[1] Univ Helsinki, Inst Behav Sci, Helsinki, Finland
来源
STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014 | 2014年 / 8791卷
基金
芬兰科学院;
关键词
Phonetics; Machine learning; Speech synthesis; Prosody; PARAMETER GENERATION; PERCEPTION; LANGUAGE; CONTEXT; SYSTEM; VOWEL; TEXT;
D O I
10.1007/978-3-319-11397-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-speech synthesis is a task that solves many realworld problems such as providing speaking and reading ability to people who lack those capabilities. It is thus viewed mainly as an engineering problem rather than a purely scientific one. Therefore many of the solutions in speech synthesis are purely practical. However, from the point of view of phonetics, the process of producing speech from text artificially is also a scientific one. Here I argue - using an example from speech prosody, namely speech melody - that phonetics is the key discipline in helping to solve what is arguably one of the most interesting problems in machine learning.
引用
收藏
页码:37 / 54
页数:18
相关论文
共 93 条
[1]   A method for generating natural-sounding speech stimuli for cognitive brain research [J].
Alku, P ;
Tiitinen, H ;
Näätänen, R .
CLINICAL NEUROPHYSIOLOGY, 1999, 110 (08) :1329-1333
[2]   GLOTTAL WAVE ANALYSIS WITH PITCH SYNCHRONOUS ITERATIVE ADAPTIVE INVERSE FILTERING [J].
ALKU, P .
SPEECH COMMUNICATION, 1992, 11 (2-3) :109-118
[3]  
Altosaar T., 1988, P IEEE ICASSP 88 NEW
[4]  
[Anonymous], 2011, P INTERSPEECH
[5]  
[Anonymous], P INTERSPEECH
[6]  
[Anonymous], 2012, ARXIV12032990
[7]  
[Anonymous], 2013, THESIS
[8]  
[Anonymous], 2013, 8 ISCA WORKSHOP SPEE
[9]  
[Anonymous], BLIZZ CHALL 2010 WOR
[10]  
[Anonymous], 1997, Multilingual Text-to-Speech Synthesis": The Bell Labs Approach