LSTM-Based Kazakh Speech Synthesis

被引:0
作者
Kaliyev, Arman [1 ]
机构
[1] ITMO Univ, St Petersburg, Russia
来源
SPEECH AND COMPUTER, SPECOM 2019 | 2019年 / 11658卷
关键词
Statistical parametric speech synthesis; Speech synthesis; LSTM; Kazakh language; Under-resourced languages;
D O I
10.1007/978-3-030-26061-3_21
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Currently, the level of penetration of speech technology in modern life begins to vary greatly by country and by language environment. This is especially noticeable in the services developed in leading technology companies, where high-resource languages such as English, Russian, etc. have become the main service languages. Whereas, the speech technologies for under-resourced languages lag in their development. The article presents the first speech synthesis system based on the long-term short-term memory (LSTM) neural network architecture for the Kazakh language. The presented text-to-speech (TTS) system includes previously developed methods of prosodic processing for under-resourced languages and an acoustic model based on LSTM. The system receives the linguistic features of the text, including phonetic transcription, and it generates Kazakh speech with an acceptable quality of perception. Briefly summing up, this work describes the method of developing a speech synthesis for the Kazakh language, which has limited resources in terms of natural language processing. This approach can also be applied to other under-resourced languages.
引用
收藏
页码:201 / 208
页数:8
相关论文
共 20 条
[1]  
An SM, 2017, ASIAPAC SIGN INFO PR, P1563, DOI 10.1109/APSIPA.2017.8282282
[2]  
Berment V., 2004, THESES
[3]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[4]  
Fan Y., 2014, Fifteenth Annual Conference of the International Speech Communication Association
[5]   Modeling pause for the synthesis of Kazakh speech [J].
Kaliyev, Arman ;
Rybin, Sergey, V ;
Matveev, Yuri N. ;
Kaziyeva, Nazym ;
Burambayeva, Nursaule .
ICEMIS'18: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON ENGINEERING AND MIS, 2018,
[6]   Phoneme Duration Prediction for Kazakh Language [J].
Kaliyev, Arman ;
Rybin, Sergey V. ;
Matveev, Yuri N. .
SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 :274-280
[7]   The Pausing Method Based on Brown Clustering and Word Embedding [J].
Kaliyev, Arman ;
Rybin, Sergey V. ;
Matveev, Yuri .
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 :741-747
[8]  
[Карпов Алексей Анатольевич Karpov Alexey A.], 2015, [Вопросы языкознания, Voprosy yazykoznaniya], P117
[9]   A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis [J].
Khomitsevich, Olga ;
Mendelev, Valentin ;
Tomashenko, Natalia ;
Rybin, Sergey ;
Medennikov, Ivan ;
Kudubayeva, Saule .
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 :25-33
[10]  
Krauwer S., 2003, P SPECOM 2003, P8