CROSS-LINGUAL TEXT-TO-SPEECH VIA HIERARCHICAL STYLE TRANSFER

被引：0

作者：

Lee, Sang-Hoon ^{[1
]}

Choi, Ha-Yeong ^{[1
]}

Lee, Seong-Whan ^{[1
]}

机构：

[1] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年

关键词：

Cross-lingual TTS; Multi-lingual TTS;

D O I：

10.1109/ICASSPW62465.2024.10627450

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents LIMITLESS, a cross-lingual text-to-speech via hierarchical style transfer that can transfer the prosody and voice style, respectively. Building upon HierSpeech++, we utilize the 2-stage hierarchical speech synthesis frameworks with text-to-vector (TTV) and vector-to-speech. We simply modify the TTV by adding the language embedding of each language on the text representation and use the hierarchical speech synthesizer without modification. We train the TTV model with 7 languages and 14 speakers from the Indic languages dataset which was released for LIMMITS 2024 and fine-tuned the TTV model with target speakers for Track 1 and 2. The results show that our framework can transfer voice style robustly in terms of speaker similarity.

引用

页码：25 / 26

页数：2

共 50 条

[1] GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
Cong, Yahuan
Zhang, Haoyu
Lin, Haopeng
Liu, Shichao
Wang, Chunfeng
Ren, Yi
Yin, Xiang
Ma, Zejun
INTERSPEECH 2023, 2023, : 5486 - 5490
[2] METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer
Zhu, Xinfa
Lei, Yi
Li, Tao
Zhang, Yongmao
Zhou, Hongbin
Lu, Heng
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1506 - 1518
[3] VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Gudmalwar, Ashishkumar
Shah, Nirmesh
Akarsh, Sai
Wasnik, Pankaj
Shah, Rajiv Ratn
INTERSPEECH 2024, 2024, : 3000 - 3004
[4] Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Yamamoto, Ryuichi
Shirahata, Yuma
Kawamura, Masaya
Tachibana, Kentaro
arXiv,
[5] Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech
Zhan, Haoyue
Yu, Xinyuan
Zhang, Haitong
Zhang, Yang
Lin, Yue
INTERSPEECH 2022, 2022, : 4247 - 4251
[6] DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Liu, Sen
Guo, Yiwei
Du, Chenpeng
Chen, Xie
Yu, Kai
INTERSPEECH 2023, 2023, : 616 - 620
[7] End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Chen, Yuan-Jui
Tu, Tao
Yeh, Cheng-chieh
Lee, Hung-yi
INTERSPEECH 2019, 2019, : 2075 - 2079
[8] Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space
Xin, Detai
Saito, Yuki
Takamichi, Shinnosuke
Koriyama, Tomoki
Saruwatari, Hiroshi
INTERSPEECH 2020, 2020, : 2947 - 2951
[9] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Byambadorj, Zolzaya
Nishimura, Ryota
Ayush, Altangerel
Ohta, Kengo
Kitaoka, Norihide
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[10] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Zolzaya Byambadorj
Ryota Nishimura
Altangerel Ayush
Kengo Ohta
Norihide Kitaoka
EURASIP Journal on Audio, Speech, and Music Processing, 2021

← 1 2 3 4 5 →