Modeling pause for the synthesis of Kazakh speech

被引:3
作者
Kaliyev, Arman [1 ]
Rybin, Sergey, V [1 ]
Matveev, Yuri N. [1 ]
Kaziyeva, Nazym [1 ]
Burambayeva, Nursaule [2 ]
机构
[1] ITMO Univ, Kronverksky 49, St Petersburg, Russia
[2] LN Gumilyov Eurasian Natl Univ, K Munaitpasova 5, Astana, Kazakhstan
来源
ICEMIS'18: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON ENGINEERING AND MIS | 2018年
关键词
Speech synthesis; pause; prosody; word embedding; machine learning;
D O I
10.1145/3234698.3234699
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
One of the most important stages in the development of intonational text-to-speech systems is prosodic processing. The correct pause placement and pause duration play a large role for the perception of speech by human hearing. In this article, the authors propose a new way of modeling pause for the Kazakh language using a vector representation of words. Our group presents a model that uses the minimum amount of data obtained with the help of manual tagging. We believe that such an approach will be of interest for scientific groups working on the development of speech synthesis systems for poorly studied languages.
引用
收藏
页数:4
相关论文
共 16 条
[1]   G-SPAMINE: An approach to discover temporal association patterns and trends in internet of things [J].
Aljawarneh, Shadi A. ;
Radhakrishna, Vangipuram ;
Kumar, Puligadda Veereswara ;
Janaki, Vinjamuri .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 74 :430-443
[2]  
[Anonymous], 2015, WorkshoponVectorSpaceModeling forNaturalLanguageProcessing, DOI DOI 10.3115/V1/W15-1511
[3]   INFORMATION-RETRIEVAL - VANRIJS']JSBERGEN,CJ [J].
BLAIR, DC .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1979, 30 (06) :374-375
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[6]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[7]   The Pausing Method Based on Brown Clustering and Word Embedding [J].
Kaliyev, Arman ;
Rybin, Sergey V. ;
Matveev, Yuri .
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 :741-747
[8]  
Koo T., 2008, P ANN M ASS COMPUTAT, P595
[9]  
Loh W.Y., 2008, CLASSIFICATION REGRE, DOI [10.1002/9780470061572.eqr492, DOI 10.1002/9780470061572.EQR492]
[10]  
Miller S, 2004, HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, P337