LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

被引：14

作者：

Koizumi, Yuma ^{[1
]}

Zen, Heiga ^{[1
]}

Karita, Shigeki ^{[1
]}

Ding, Yifan ^{[1
]}

Yatabe, Kohei ^{[2
]}

Morioka, Nobuyuki ^{[1
]}

Bacchiani, Michiel ^{[1
]}

Zhang, Yu ^{[3
]}

Han, Wei ^{[3
]}

Bapna, Ankur ^{[3
]}

机构：

[1] Google, Tokyo, Japan

[2] Tokyo Univ Agr Technol, Tokyo, Japan

[3] Google, Mountain View, CA USA

来源：

INTERSPEECH 2023 | 2023年

关键词：

Text-to-speech; dataset; speech restoration;

D O I：

10.21437/Interspeech.2023-1584

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper introduces a new speech dataset called "LibriTTS-R" designed for text-to-speech (TTS) use. It is derived by applying speech restoration to the LibriTTS corpus, which consists of 585 hours of speech data at 24 kHz sampling rate from 2,456 speakers and the corresponding texts. The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved. Experimental results show that the LibriTTS-R ground-truth samples showed significantly improved sound quality compared to those in LibriTTS. In addition, neural end-to-end TTS trained with LibriTTS-R achieved speech naturalness on par with that of the ground-truth samples. The corpus is freely available for download from http: //www.openslr.org/141/.

引用

页码：5496 / 5500

页数：5

共 50 条

[21] On building phonetically and prosodically rich speech corpus for text-to-speech synthesis [J].

Matousek, Jindrich ;

Romportl, Jan .

PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, :442-+

[22] Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech [J].

Shaheen, Zein ;

Sadekova, Tasnima ;

Matveeva, Yulia ;

Shirshova, Alexandra ;

Kudinov, Mikhail .

INTERSPEECH 2023, 2023, :2038-2042

[23] AISHELL-3: A Multi-Speaker Mandarin TTS Corpus [J].

Shi, Yao ;

Bu, Hui ;

Xu, Xin ;

Zhang, Shaoji ;

Li, Ming .

INTERSPEECH 2021, 2021, :2756-2760

[24] Multi speaker text-to-speech synthesis using generalized end-to-end loss function [J].

Nazir, Owais ;

Malik, Aruna ;

Singh, Samayveer ;

Pathan, Al-Sakib Khan .

MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) :64205-64222

[25] Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need [J].

Kuzdeuov, Askat ;

Nurgaliyev, Shakhizat ;

Turmakhan, Diana ;

Laiyk, Nurkhan ;

Varol, Huseyin Atakan .

2023 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND ARTIFICIAL INTELLIGENCE, RAAI 2023, 2023, :286-291

[26] SYNTHE-SEES: FACE BASED TEXT-TO-SPEECH FOR VIRTUAL SPEAKER [J].

Park, Jae Hyun ;

Maeng, Joon-Gyu ;

Bak, TaeJun ;

Jo, Young-Sun .

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :10321-10325

[27] Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech [J].

Oo, Yin May ;

Wattanavekin, Theeraphol ;

Li, Chenfang ;

De Silva, Pasindu ;

Sarin, Supheakmungkol ;

Pipatsrisawat, Knot ;

Jansche, Martin ;

Kjartansson, Oddur ;

Gutkin, Alexander .

PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, :6328-6339

[28] FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks [J].

Mal, Mm ;

Koizumi, Yuma ;

Karita, Shigeki ;

Zen, Heiga ;

Riesa, Jason ;

Ishikawa, Haruko ;

Bacchiani, Michiel .

INTERSPEECH 2024, 2024, :1835-1839

[29] ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus [J].

Kulkarni, Ajinkya ;

Kulkarni, Atharva ;

Shatnawi, Sara Abedalmon'em Mohammad ;

Aldarmaki, Hanan .

INTERSPEECH 2023, 2023, :5511-5515

[30] CROSS-SPEAKER STYLE TRANSFER FOR TEXT-TO-SPEECH USING DATA AUGMENTATION [J].

Ribeiro, Manuel Sam ;

Roth, Julian ;

Comini, Giulia ;

Huybrechts, Goeric ;

Gabrys, Adam ;

Lorenzo-Trueba, Jaime .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6797-6801

← 1 2 3 4 5 →