CROSS-LINGUAL TEXT-TO-SPEECH VIA HIERARCHICAL STYLE TRANSFER

被引:0
|
作者
Lee, Sang-Hoon [1 ]
Choi, Ha-Yeong [1 ]
Lee, Seong-Whan [1 ]
机构
[1] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea
关键词
Cross-lingual TTS; Multi-lingual TTS;
D O I
10.1109/ICASSPW62465.2024.10627450
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents LIMITLESS, a cross-lingual text-to-speech via hierarchical style transfer that can transfer the prosody and voice style, respectively. Building upon HierSpeech++, we utilize the 2-stage hierarchical speech synthesis frameworks with text-to-vector (TTV) and vector-to-speech. We simply modify the TTV by adding the language embedding of each language on the text representation and use the hierarchical speech synthesizer without modification. We train the TTV model with 7 languages and 14 speakers from the Indic languages dataset which was released for LIMMITS 2024 and fine-tuned the TTV model with target speakers for Track 1 and 2. The results show that our framework can transfer voice style robustly in terms of speaker similarity.
引用
收藏
页码:25 / 26
页数:2
相关论文
共 50 条
  • [21] Cross-Lingual Text Categorization
    Bel, N
    Koster, CHA
    Villegas, M
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2003, 2769 : 126 - 139
  • [22] Incorporating Cross-speaker Style Transfer for Multi-language Text-to-Speech
    Shang, Zengqiang
    Huang, Zhihua
    Zhang, Haozhe
    Zhang, Pengyuan
    Yan, Yonghong
    INTERSPEECH 2021, 2021, : 1619 - 1623
  • [23] X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
    Guo, Houjian
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    INTERSPEECH 2024, 2024, : 4983 - 4987
  • [24] Hola-TTS: A Cross-Lingual Zero-Shot Text-to-Speech System for Chinese, English, Japanese, and Korean
    Ding, Hongwu
    Zhou, Yiquan
    Wang, Wenyu
    Xu, JiaCheng
    Mei, Jiaqi
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 601 - 605
  • [25] Hola-TTS: A Cross-Lingual Zero-Shot Text-to-Speech System for Chinese, English, Japanese, and Korean
    Ding, Hongwu
    Zhou, Yiquan
    Wang, Wenyu
    Xu, JiaCheng
    Mei, Jiaqi
    2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, 2024, : 601 - 605
  • [26] ON-THE-FLY DATA AUGMENTATION FOR TEXT-TO-SPEECH STYLE TRANSFER
    Chung, Raymond
    Mak, Brian
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 634 - 641
  • [27] Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
    Guan, Wenhao
    Li, Tao
    Li, Yishuang
    Huang, Hukai
    Hong, Qingyang
    Li, Lin
    INTERSPEECH 2023, 2023, : 4304 - 4308
  • [28] Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer
    Secujski, Milan
    Gerazov, Branislav
    Csapo, Tamas Gabor
    Delic, Vlado
    Garner, Philip N.
    Gjoreski, Aleksandar
    Guennec, David
    Ivanovski, Zoran
    Melov, Aleksandar
    Nemeth, Geza
    Stojkovic, Ana
    Szaszak, Gyoergy
    SPEECH AND COMPUTER, 2016, 9811 : 199 - 206
  • [29] Interactive Text-to-Speech System via Joint Style Analysis
    Gao, Yang
    Zheng, Weiyi
    Yang, Zhaojun
    Koehler, Thilo
    Fuegen, Christian
    He, Qing
    INTERSPEECH 2020, 2020, : 4447 - 4451
  • [30] mCLIP: Multilingual CLIP via Cross-lingual Transfer
    Chen, Guanhua
    Hou, Lu
    Chen, Yun
    Dai, Wenliang
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Pan, Jia
    Wang, Wenping
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13028 - 13043