Samromur Children: An Icelandic Speech Corpus

被引:0
作者
Mena, Carlos [1 ]
Mollberg, David Erik [1 ]
Borsky, Michal [1 ]
Gudnason, Jon [1 ]
机构
[1] Reykjavik Univ, Language & Voice Lab, Menntavegur 1, IS-102 Reykjavik, Iceland
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
children's speech corpus; children's speech recognition; icelandic children's speech; icelandic corpus; RECOGNITION; FEATURES; SPEAKER;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Y Samromur Children is an Icelandic speech corpus intended for the field of automatic speech recognition. It contains 131 hours of read speech from Icelandic children aged between 4 to 17 years. The test portion was meticulously selected to cover a wide range of ages as possible as we aimed to have exactly the same amount of data per age range. The speech was collected with the crowd-sourcing platform samromur.is, which is inspired on the "Mozilla's Common Voice Project". The corpus was developed within the framework of the "Language Technology Programme for Icelandic 2019 2023"; the goal of the project is to make Icelandic available in language-technology applications. Samromur Children is the first corpus in Icelandic with children's voices for public use under a Creative Commons license. Additionally, we present baseline experiments and results using Kaldi.
引用
收藏
页码:995 / 1002
页数:8
相关论文
共 63 条
[61]  
Wilpon JG, 1996, INT CONF ACOUST SPEE, P349, DOI 10.1109/ICASSP.1996.541104
[62]   Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network [J].
Wu, Fei ;
Garcia, Leibny Paola ;
Povey, Daniel ;
Khudanpur, Sanjeev .
INTERSPEECH 2019, 2019, :1-5
[63]   THE SLT 2021 CHILDREN SPEECH RECOGNITION CHALLENGE: OPEN DATASETS, RULES AND BASELINES [J].
Yu, Fan ;
Yao, Zhuoyuan ;
Wang, Xiong ;
An, Keyu ;
Xie, Lei ;
Ou, Zhijian ;
Liu, Bo ;
Li, Xiulin ;
Miao, Guanqiong .
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, :1117-1123