Samromur Children: An Icelandic Speech Corpus

被引:0
作者
Mena, Carlos [1 ]
Mollberg, David Erik [1 ]
Borsky, Michal [1 ]
Gudnason, Jon [1 ]
机构
[1] Reykjavik Univ, Language & Voice Lab, Menntavegur 1, IS-102 Reykjavik, Iceland
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
children's speech corpus; children's speech recognition; icelandic children's speech; icelandic corpus; RECOGNITION; FEATURES; SPEAKER;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Y Samromur Children is an Icelandic speech corpus intended for the field of automatic speech recognition. It contains 131 hours of read speech from Icelandic children aged between 4 to 17 years. The test portion was meticulously selected to cover a wide range of ages as possible as we aimed to have exactly the same amount of data per age range. The speech was collected with the crowd-sourcing platform samromur.is, which is inspired on the "Mozilla's Common Voice Project". The corpus was developed within the framework of the "Language Technology Programme for Icelandic 2019 2023"; the goal of the project is to make Icelandic available in language-technology applications. Samromur Children is the first corpus in Icelandic with children's voices for public use under a Creative Commons license. Additionally, we present baseline experiments and results using Kaldi.
引用
收藏
页码:995 / 1002
页数:8
相关论文
共 63 条
[1]  
[Anonymous], 9 INT C SPOK LANG PR
[2]  
[Anonymous], 2002, INTERSPEECH
[3]  
[Anonymous], 2010, Law of the Azerbaijan Republic about Personal Data
[4]  
Ardila Rosana, 2019, ARXIV191206670
[5]  
Bangalore Srinivas, 2012, P 2012 C N AM CHAPT, P437
[6]   Joint-sequence models for grapheme-to-phoneme conversion [J].
Bisani, Maximilian ;
Ney, Hermann .
SPEECH COMMUNICATION, 2008, 50 (05) :434-451
[7]  
Chen G., 2020, ARXIV201104547
[8]  
Claus F, 2013, INTERSPEECH, P2409
[9]  
Das S, 1998, INT CONF ACOUST SPEE, P433, DOI 10.1109/ICASSP.1998.674460
[10]  
Elenius D., 2005, 9 EUR C SPEECH COMM