The Kestrel TTS text normalization system

被引:51
作者
Ebden, Peter [1 ]
Sproat, Richard [1 ]
机构
[1] Google Inc, New York, NY USA
关键词
FINITE-STATE TRANSDUCERS;
D O I
10.1017/S1351324914000175
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the Kestrel text normalization system, a component of the Google text-to-speech synthesis (TTS) system. At the core of Kestrel are text-normalization grammars that are compiled into libraries of weighted finite-state transducers (WFSTs). While the use of WFSTs for text normalization is itself not new, Kestrel differs from previous systems in its separation of the initial tokenization and classification phase of analysis from verbalization. Input text is first tokenized and different tokens classified using WFSTs. As part of the classification, detected semiotic classes - expressions such as currency amounts, dates, times, measure phases, are parsed into protocol buffers (https://code.google.com/p/protobuf/). The protocol buffers are then verbalized, with possible reordering of the elements, again using WFSTs. This paper describes the architecture of Kestrel, the protocol buffer representations of semiotic classes, and presents some examples of grammars for various languages. We also discuss applications and deployments of Kestrel as part of the Google TTS system, which runs on both server and client side on multiple devices, and is used daily by millions of people in nineteen languages and counting.
引用
收藏
页码:333 / 353
页数:21
相关论文
共 36 条
[1]  
Abney S., 1996, Natural Language Engineering, V2, P337, DOI 10.1017/S1351324997001599
[2]   NESTED STACK AUTOMATA [J].
AHO, AV .
JOURNAL OF THE ACM, 1969, 16 (03) :383-&
[3]  
Allauzen Cyril, 2012, Implementation and Application of Automata. Proceedings of the 17th International Conference (CIAA 2012), P66, DOI 10.1007/978-3-642-31606-7_6
[4]  
Allauzen C, 2011, LECT NOTES COMPUT SC, V6482, P28, DOI 10.1007/978-3-642-18098-9_4
[5]  
Allen Jonathan, 1987, From text to speech: The MITalk system
[6]  
[Anonymous], NATURAL LANGUAGE ENG
[7]  
[Anonymous], EUROSPEECH
[8]  
[Anonymous], PHONETIK AIMS ARBEIT
[9]  
[Anonymous], P 20 INT C COMP LING
[10]  
[Anonymous], ARPA WORKSH HUM LANG