Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS

被引：1

作者：

Kalyan, T. Pavan ^{[1
]}

Rao, Preeti ^{[1
]}

Jyothi, Preethi ^{[1
]}

Bhattacharyya, Pushpak ^{[1
]}

机构：

[1] Indian Inst Technol, Mumbai, Maharashtra, India

来源：

INTERSPEECH 2023 | 2023年

关键词：

Expressive TTS; speech synthesis; new TTS corpus; prosody modelling;

D O I：

10.21437/Interspeech.2023-2469

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current Text-to-Speech (TTS) systems are trained on audiobook data and perform well in synthesizing read-style speech. In this work, we are interested in synthesizing audio stories as narrated to children. The storytelling style is more expressive and requires perceptible changes of voice across the narrator and story characters. To address these challenges, we present a new TTS corpus of English audio stories for children with 32.7 hours of speech by a single female speaker with a UK accent. We provide evidence of the salient differences in the suprasegmentals of the narrator and character utterances in the dataset, motivating the use of a multi-speaker TTS for our application. We use a fine-tuned BERT model to label each sentence as being spoken by a narrator or character that is subsequently used to condition the TTS output. Experiments show our new TTS system is superior in expressiveness in both A-B preference and MOS testing compared to reading-style TTS and single-speaker TTS.

引用

页码：4808 / 4812

页数：5

共 44 条

[1] LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning
Udupa, Sathvik
Bandekar, Jesuraja
Singh, Abhayjeet
Deekshitha, G.
Kumar, Saurabh
Badiger, Sandhya
Nagireddi, Amala
Roopa, R.
Ghosh, Prasanta Kumar
Murthy, Hema A.
Kumar, Pranaw
Tokuda, Keiichi
Hasegawa-Johnson, Mark
Olbrich, Philipp
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 293 - 302
[2] LIMMITS'24: MULTI-SPEAKER, MULTI-LINGUAL INDIC TTS WITH VOICE CLONING<bold> </bold>
Singh, Abhayjeet
Nagireddi, Amala
Deekshitha, G.
Bandekar, Jesuraja
Roopa, R.
Badiger, Sandhya
Udupa, Sathvik
Ghosh, Prasanta Kumar
Murthy, Hema A.
Kumar, Pranaw
Tokuda, Keiichi
Hasegawa-Johnson, Mark
Olbrich, Philipp
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 61 - 62
[3] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[4] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Cooper, Erica
Lai, Cheng-, I
Yasuda, Yusuke
Yamagishi, Junichi
INTERSPEECH 2020, 2020, : 3979 - 3983
[5] STORiCo: Storytelling TTS for Hindi with Character Voice Modulation
Kalyan, Pavan
Jyothi, Preethi
Rao, Preeti
Bhattacharyya, Pushpak
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 426 - 431
[6] THE MULTI-SPEAKER MULTI-STYLE VOICE CLONING CHALLENGE 2021
Xie, Qicong
Tian, Xiaohai
Liu, Guanghou
Song, Kun
Xie, Lei
Wu, Zhiyong
Li, Hai
Shi, Song
Li, Haizhou
Hong, Fen
Bu, Hui
Xu, Xin
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8613 - 8617
[7] LIGHT-TTS: LIGHTWEIGHT MULTI-SPEAKER MULTI-LINGUAL TEXT-TO-SPEECH
Li, Song
Ouyang, Beibei
Li, Lin
Hong, Qingyang
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8383 - 8387
[8] DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech
Adibian, Majid
Zeinali, Hossein
Barmaki, Soroush
LANGUAGE RESOURCES AND EVALUATION, 2025,
[9] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Casanova, Edresson
Shulby, Christopher
Korolev, Alexander
Candido Junior, Arnaldo
Soares, Anderson da Silva
Aluisio, Sandra
Ponti, Moacir Antonelli
INTERSPEECH 2023, 2023, : 1244 - 1248
[10] TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Zhang, Xulong
Wang, Jianzong
Cheng, Ning
Xiao, Jing
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

← 1 2 3 4 5 →