CROSS-LINGUAL TEXT-TO-SPEECH VIA HIERARCHICAL STYLE TRANSFER

被引：0

作者：

Lee, Sang-Hoon ^{[1
]}

Choi, Ha-Yeong ^{[1
]}

Lee, Seong-Whan ^{[1
]}

机构：

[1] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年

关键词：

Cross-lingual TTS; Multi-lingual TTS;

D O I：

10.1109/ICASSPW62465.2024.10627450

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents LIMITLESS, a cross-lingual text-to-speech via hierarchical style transfer that can transfer the prosody and voice style, respectively. Building upon HierSpeech++, we utilize the 2-stage hierarchical speech synthesis frameworks with text-to-vector (TTV) and vector-to-speech. We simply modify the TTV by adding the language embedding of each language on the text representation and use the hierarchical speech synthesizer without modification. We train the TTV model with 7 languages and 14 speakers from the Indic languages dataset which was released for LIMMITS 2024 and fine-tuned the TTV model with target speakers for Track 1 and 2. The results show that our framework can transfer voice style robustly in terms of speaker similarity.

引用

页码：25 / 26

页数：2

共 50 条

[21] Cross-Lingual Text Categorization
Bel, N
Koster, CHA
Villegas, M
RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2003, 2769 : 126 - 139
[22] Incorporating Cross-speaker Style Transfer for Multi-language Text-to-Speech
Shang, Zengqiang
Huang, Zhihua
Zhang, Haozhe
Zhang, Pengyuan
Yan, Yonghong
INTERSPEECH 2021, 2021, : 1619 - 1623
[23] X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
Guo, Houjian
Liu, Chaoran
Ishi, Carlos Toshinori
Ishiguro, Hiroshi
INTERSPEECH 2024, 2024, : 4983 - 4987
[24] Hola-TTS: A Cross-Lingual Zero-Shot Text-to-Speech System for Chinese, English, Japanese, and Korean
Ding, Hongwu
Zhou, Yiquan
Wang, Wenyu
Xu, JiaCheng
Mei, Jiaqi
2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 601 - 605
[25] Hola-TTS: A Cross-Lingual Zero-Shot Text-to-Speech System for Chinese, English, Japanese, and Korean
Ding, Hongwu
Zhou, Yiquan
Wang, Wenyu
Xu, JiaCheng
Mei, Jiaqi
2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, 2024, : 601 - 605
[26] ON-THE-FLY DATA AUGMENTATION FOR TEXT-TO-SPEECH STYLE TRANSFER
Chung, Raymond
Mak, Brian
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 634 - 641
[27] Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Guan, Wenhao
Li, Tao
Li, Yishuang
Huang, Hukai
Hong, Qingyang
Li, Lin
INTERSPEECH 2023, 2023, : 4304 - 4308
[28] Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer
Secujski, Milan
Gerazov, Branislav
Csapo, Tamas Gabor
Delic, Vlado
Garner, Philip N.
Gjoreski, Aleksandar
Guennec, David
Ivanovski, Zoran
Melov, Aleksandar
Nemeth, Geza
Stojkovic, Ana
Szaszak, Gyoergy
SPEECH AND COMPUTER, 2016, 9811 : 199 - 206
[29] Interactive Text-to-Speech System via Joint Style Analysis
Gao, Yang
Zheng, Weiyi
Yang, Zhaojun
Koehler, Thilo
Fuegen, Christian
He, Qing
INTERSPEECH 2020, 2020, : 4447 - 4451
[30] mCLIP: Multilingual CLIP via Cross-lingual Transfer
Chen, Guanhua
Hou, Lu
Chen, Yun
Dai, Wenliang
Shang, Lifeng
Jiang, Xin
Liu, Qun
Pan, Jia
Wang, Wenping
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13028 - 13043

← 1 2 3 4 5 →