CROSS-LINGUAL TEXT-TO-SPEECH VIA HIERARCHICAL STYLE TRANSFER

被引:0
|
作者
Lee, Sang-Hoon [1 ]
Choi, Ha-Yeong [1 ]
Lee, Seong-Whan [1 ]
机构
[1] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea
关键词
Cross-lingual TTS; Multi-lingual TTS;
D O I
10.1109/ICASSPW62465.2024.10627450
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents LIMITLESS, a cross-lingual text-to-speech via hierarchical style transfer that can transfer the prosody and voice style, respectively. Building upon HierSpeech++, we utilize the 2-stage hierarchical speech synthesis frameworks with text-to-vector (TTV) and vector-to-speech. We simply modify the TTV by adding the language embedding of each language on the text representation and use the hierarchical speech synthesizer without modification. We train the TTV model with 7 languages and 14 speakers from the Indic languages dataset which was released for LIMMITS 2024 and fine-tuned the TTV model with target speakers for Track 1 and 2. The results show that our framework can transfer voice style robustly in terms of speaker similarity.
引用
收藏
页码:25 / 26
页数:2
相关论文
共 50 条
  • [1] GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech
    Cong, Yahuan
    Zhang, Haoyu
    Lin, Haopeng
    Liu, Shichao
    Wang, Chunfeng
    Ren, Yi
    Yin, Xiang
    Ma, Zejun
    INTERSPEECH 2023, 2023, : 5486 - 5490
  • [2] METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer
    Zhu, Xinfa
    Lei, Yi
    Li, Tao
    Zhang, Yongmao
    Zhou, Hongbin
    Lu, Heng
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1506 - 1518
  • [3] VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
    Gudmalwar, Ashishkumar
    Shah, Nirmesh
    Akarsh, Sai
    Wasnik, Pankaj
    Shah, Rajiv Ratn
    INTERSPEECH 2024, 2024, : 3000 - 3004
  • [4] Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
    Yamamoto, Ryuichi
    Shirahata, Yuma
    Kawamura, Masaya
    Tachibana, Kentaro
    arXiv,
  • [5] Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech
    Zhan, Haoyue
    Yu, Xinyuan
    Zhang, Haitong
    Zhang, Yang
    Lin, Yue
    INTERSPEECH 2022, 2022, : 4247 - 4251
  • [6] DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
    Liu, Sen
    Guo, Yiwei
    Du, Chenpeng
    Chen, Xie
    Yu, Kai
    INTERSPEECH 2023, 2023, : 616 - 620
  • [7] End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
    Chen, Yuan-Jui
    Tu, Tao
    Yeh, Cheng-chieh
    Lee, Hung-yi
    INTERSPEECH 2019, 2019, : 2075 - 2079
  • [8] Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space
    Xin, Detai
    Saito, Yuki
    Takamichi, Shinnosuke
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2020, 2020, : 2947 - 2951
  • [9] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [10] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Zolzaya Byambadorj
    Ryota Nishimura
    Altangerel Ayush
    Kengo Ohta
    Norihide Kitaoka
    EURASIP Journal on Audio, Speech, and Music Processing, 2021