Predicting utterance pitch targets in Yoruba for tone realisation in speech synthesis

被引:3
作者
Van Niekerk, Daniel R. [1 ,2 ]
Barnard, Etienne [1 ]
机构
[1] North West Univ, Vanderbijlpark, South Africa
[2] CSIR, Meraka Inst, Human Language Technol Res Grp, ZA-0001 Pretoria, South Africa
关键词
Yoruba; Tone language; Speech synthesis; Fundamental frequency; UNIVERSALITY; INTONATION;
D O I
10.1016/j.specom.2013.01.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Pitch is a fundamental acoustic feature of speech and as such needs to be determined during the process of speech synthesis. While a range of communicative functions are attributed to pitch variation in speech of all languages, it plays a vital role in distinguishing meaning of lexical items in tone languages. As a number of factors are assumed to affect the realisation of pitch, it is important to know which mechanisms are systematically responsible for pitch realisation in order to be able to model these effectively and thus develop robust speech synthesis systems in under-resourced environments. To this end, features influencing syllable pitch targets in continuous utterances in Yoruba are investigated in a small speech corpus of 4 speakers. It is found that the previous syllable pitch level is strongly correlated with pitch changes between syllables and a number of approaches and features are evaluated in this context. The resulting models can be used to predict utterance pitch targets for speech synthesisers (whether it be concatenative or statistical parametric systems), and may also prove useful in speech-recognition systems. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:229 / 242
页数:14
相关论文
共 28 条
[21]   Modeling tone and intonation in Mandarin and English as a process of target approximation [J].
Prom-on, Santitham ;
Xu, Yi ;
Thipakorn, Bundit .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (01) :405-424
[22]   A modular holistic approach to prosody modelling for Standard Yoruba speech synthesis [J].
Qdejobi, Odetunji A. ;
Wong, Shun Ha Sylvia ;
Beaumont, Anthony J. .
COMPUTER SPEECH AND LANGUAGE, 2008, 22 (01) :39-68
[23]   Prosody conversion from neutral speech to emotional speech [J].
Tao, Jianhua ;
Kang, Yongguo ;
Li, Aijun .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1145-1154
[24]   THE UNIVERSALITY OF INTRINSIC F-0 OF VOWELS [J].
WHALEN, DH ;
LEVITT, AG .
JOURNAL OF PHONETICS, 1995, 23 (03) :349-366
[25]   Speech melody as articulatorily implemented communicative functions [J].
Xu, Y .
SPEECH COMMUNICATION, 2005, 46 (3-4) :220-251
[26]  
Xu Yi., 2000, P 6 INT C SPOKEN LAN, P666
[27]  
Young S., 1994, P ARPA WORKSHOP HUMA, P307
[28]  
Zen H., 2006, 6 INT WORKSH SPEECH, P294