TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese

被引:6
作者
Casanova, Edresson [1 ]
Junior, Arnaldo Candido [2 ]
Shulby, Christopher [3 ]
de Oliveira, Frederico Santos [4 ]
Teixeira, Joao Paulo [5 ]
Ponti, Moacir Antonelli [1 ]
Aluisio, Sandra [1 ]
机构
[1] Univ Sao Paulo, Inst Ciencias Matemat & Comp, Sao Carlos, Brazil
[2] Fed Univ Technol Parana UTFPR, Medianeira, Brazil
[3] DefinedCrowd Corp, Seattle, WA USA
[4] Univ Fed Mato Grosso, Cuiaba, Brazil
[5] Inst Politecn Braganca, Res Ctr Digitalizat & Intelligent Robot CEDRI, Braganca, Portugal
关键词
Corpora; Speech synthesis; TTS; Portuguese; SIGNAL ESTIMATION;
D O I
10.1007/s10579-021-09570-4
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese.
引用
收藏
页码:1043 / 1055
页数:13
相关论文
共 53 条
[1]   LSF and LPC - Derived Features for Large Vocabulary Distributed Continuous Speech Recognition in Brazilian Portuguese [J].
Alencar, V. F. S. ;
Alcaim, A. .
2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, :1237-1241
[2]  
Arik S. O., 2017, ARXIV PREPRINT
[3]  
Arik SÖ, 2017, ADV NEUR IN, V30
[4]  
Aroon Athira, 2015, 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO), P1, DOI 10.1109/ISCO.2015.7282379
[5]  
Ba Jimmy Lei, 2016, LAYER NORMALIZATION, DOI 10.48550/arXiv.1607.06450
[6]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[7]  
Benesty J, 2012, SPRBRIEF ELECT, P1, DOI 10.1007/978-3-642-23250-3
[8]  
Braude DA, 2013, INTERSPEECH, P2762
[9]  
Catanzaro B., 2020, Flowtron: An autoregressive flow-based generative network for text-to-speech synthesis
[10]  
Charpentier F. J., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P2015