Developing an Open-Source Corpus of Yoruba Speech

被引：14

作者：

Gutkin, Alexander ^{[1
]}

Demirsahin, Isin ^{[1
]}

Kjartansson, Oddur ^{[1
]}

Rivera, Clara ^{[1
]}

Tnbastin, Kola ^{[2
]}

机构：

[1] Google Res, London, England

[2] British Lib, London, England

来源：

INTERSPEECH 2020 | 2020年

关键词：

speech corpora; open-source; West Africa;

D O I：

10.21437/Interspeech.2020-1096

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper introduces an open-source speech dataset for Yoruba - one of the largest low-resource West African languages spoken by at least 22 million people. Yoruba is one of the official languages of Nigeria, Benin and Togo, and is spoken in other neighboring African countries and beyond. The corpus consists of over four hours of 48 kHz recordings from 36 male and female volunteers and the corresponding transcriptions that include disfluency annotation. The transcriptions have full diacritization, which is vital for pronunciation and lexical disambiguation. The annotated speech dataset described in this paper is primarily intended for use in text-to-speech systems, serve as adaptation data in automatic speech recognition and speech-to-speech translation, and provide insights in West African corpus linguistics. We demonstrate the use of this corpus in a simple statistical parametric speech synthesis (SPSS) scenario evaluating it against the related languages from the CMU Wilderness dataset and the Yoruba Lagos-NWU corpus.

引用

页码：404 / 408

页数：5

共 47 条

[41]

Sitaram S, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3360

[42]

Straka M, 2019, Arxiv, DOI arXiv:1908.06931

[43]

van Niekerk D., 2015, Lagos-NWU Yoruba speech corpus

[44] Predicting utterance pitch targets in Yoruba for tone realisation in speech synthesis [J].

Van Niekerk, Daniel R. ;

Barnard, Etienne .

SPEECH COMMUNICATION, 2014, 56 :229-242

[45]

Wibawa JAE, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P1610

[46]

Yusof Shahrul Azmi Mohd, 2013, 2013 IEEE 3rd International Conference on System Engineering and Technology (ICSET), P242, DOI 10.1109/ICSEngT.2013.6650178

[47] Statistical parametric speech synthesis [J].

Zen, Heiga ;

Tokuda, Keiichi ;

Black, Alan W. .

SPEECH COMMUNICATION, 2009, 51 (11) :1039-1064

← 1 2 3 4 5 →