The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset

被引:0
作者
Stan, Adriana [1 ]
Dinescu, Florina [2 ]
Tiple, Cristina [2 ]
Meza, Serban [1 ]
Orza, Bogdan [1 ]
Chirila, Magdalena [2 ]
Giurgiu, Mircea [1 ]
机构
[1] Tech Univ Cluj Napoca, Commun Dept, Cluj Napoca, Romania
[2] Iuliu Hatieganu Univ Med & Pharm, Dept Otorhinolaryngol, Cluj Napoca, Romania
来源
2017 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED) | 2017年
关键词
speech corpus; Romanian; phone-level annotation; read data; speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces one of the largest Romanian speech datasets freely available for both academic and commercial use. The dataset comprises speech data recorded over the last year from 12 speakers, along with 5 other speakers previously recorded in a separate environment. The data was manually segmented at utterance-level and semi-automatically labelled at phone-level. The resulting corpus amounts to approximately 21 hours of high-quality read speech data, split into over 19,000 utterances. The speakers read between 921 and 1493 utterances each. 880 utterances are common to all speakers and add up to over 16 hours of parallel data. We present the steps of performing the recordings and data segmentation, as well as a first use of this corpus in the context of synthetic voice development.
引用
收藏
页数:6
相关论文
共 16 条
[1]  
Boldea M., 1998, P 1 C LANG RES EV
[2]   AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS [J].
BRUGNARA, F ;
FALAVIGNA, D ;
OMOLOGO, M .
SPEECH COMMUNICATION, 1993, 12 (04) :357-370
[3]  
Cucu H., 2014, 10 INT C COMM COMM B, P1
[4]  
Dumitrescu S. D., 2014, WORKSH COLL COMP UND
[5]   Speaker-independent phoneme alignment using transition-dependent states [J].
Hosom, John-Paul .
SPEECH COMMUNICATION, 2009, 51 (04) :352-368
[6]  
Kabir Ahsanul, 2011, Recent Researches in Communications, Automation, Signal Processing, Nanotechnology, Astronomy and Nuclear Physics. 10th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications (EHAC'11). 10th WSEAS International Conference on Signal processing, Robotics and Automation (ISPRA'11). 3rd WSEAS International Conference on Nanotechnolgy (NANTECHNOLOGY'11). 6th WSEAS International Conference on Optics-Astrophysics-Astronomy (ICOAA'11). 2nd WSEA
[7]  
Kalinli O, 2013, INTERSPEECH, P2301
[8]  
Moldovan A, 2016, INT C INTELL COMP CO, P171, DOI 10.1109/ICCP.2016.7737141
[9]   WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications [J].
Morise, Masanori ;
Yokomori, Fumiya ;
Ozawa, Kenji .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (07) :1877-1884
[10]  
Popescu V., 2008, P 5 EUR C INT SYST T, P78