LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning

被引:0
|
作者
Udupa, Sathvik [1 ]
Bandekar, Jesuraja [1 ]
Singh, Abhayjeet [1 ]
Deekshitha, G. [1 ]
Kumar, Saurabh [1 ]
Badiger, Sandhya [1 ]
Nagireddi, Amala [1 ]
Roopa, R. [1 ]
Ghosh, Prasanta Kumar [1 ]
Murthy, Hema A. [2 ]
Kumar, Pranaw [3 ]
Tokuda, Keiichi [4 ]
Hasegawa-Johnson, Mark [5 ]
Olbrich, Philipp [6 ]
机构
[1] Indian Inst Sci IISc, Elect Engn Dept, Bangalore 560012, India
[2] Indian Inst Technol, Dept Comp Sci & Engn, Chennai 600036, India
[3] CDAC, Mumbai 400049, India
[4] Nagoya Inst Technol, Dept Comp Sci, Nagoya 4668555, Japan
[5] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61820 USA
[6] Deutsch Gesell Internatl Zusammenarbeit GIZ GmbH, D-53113 Bonn, Germany
来源
IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2025年 / 6卷
关键词
Cloning; Multilingual; Signal processing; Training; Text to speech; Noise measurement; Vocabulary; Solid modeling; Manuals; Encoding; Speech synthesis; multi-speaker; multi-lingual TTS; voice cloning; cross-lingual synthesis; SPEECH SYNTHESIS;
D O I
10.1109/OJSP.2025.3531782
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the multi-speaker, multi-lingual Text-to-Speech (TTS) model. Towards this, 80 hours of TTS data has been released in each of Bengali, Chhattisgarhi, English (Indian), and Kannada languages. This is in addition to Telugu, Hindi, and Marathi data released during the LIMMITS'23 challenge. The challenge encourages the advancement of TTS in Indian Languages as well as the development of multi-speaker voice cloning techniques for TTS. The three tracks of LIMMITS'24 have provided an opportunity for various researchers and practitioners around the world to explore the state of the art in research for voice cloning with TTS.
引用
收藏
页码:293 / 302
页数:10
相关论文
共 50 条
  • [41] Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis
    Kumar, Neeraj
    Narang, Ankur
    Lall, Brejesh
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1679 - 1693
  • [42] Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios
    Xie, Qicong
    Li, Tao
    Wang, Xinsheng
    Wang, Zhichao
    Xie, Lei
    Yu, Guoqiao
    Wan, Guanglu
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 66 - 70
  • [43] ANN for Multi-lingual Regional Web Communication
    Prakash, Kolla Bhanu
    Rangaswamy, M. A. Dorai
    Raman, Arun Raja
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT V, 2012, 7667 : 473 - 478
  • [44] PHONEME DEPENDENT SPEAKER EMBEDDING AND MODEL FACTORIZATION FOR MULTI-SPEAKER SPEECH SYNTHESIS AND ADAPTATION
    Fu, Ruibo
    Tao, Jianhua
    Wen, Zhengqi
    Zheng, Yibin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6930 - 6934
  • [45] Neural Networks for Multi-lingual Multi-label Document Classification
    Martinek, Jiri
    Lenc, Ladislav
    Kral, Pavel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 73 - 83
  • [46] DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
    Lee, Junmo
    Song, Kwangsub
    Noh, Kyoungjin
    Park, Tae-Jun
    Chang, Joon-Hyuk
    2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 61 - 64
  • [47] Normalization Driven Zero-shot Multi-Speaker Speech Synthesis
    Kumar, Neeraj
    Goel, Srishti
    Narang, Ankur
    Lall, Brejesh
    INTERSPEECH 2021, 2021, : 1354 - 1358
  • [48] Effective Electrical Safety Program Training in Multi-Lingual/Cultural Environments
    Kovacic, Michael
    Cunningham, Karl
    IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2019, 55 (04) : 4384 - 4388
  • [49] Comparison of real-time multi-speaker neural vocoders on CPUs
    Matsubara, Keisuke
    Okamoto, Takuma
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Toda, Tomoki
    Kawai, Hisashi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2022, 43 (02) : 121 - 124
  • [50] U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning
    Li, Tao
    Wang, Zhichao
    Zhu, Xinfa
    Cong, Jian
    Tian, Qiao
    Wang, Yuping
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4026 - 4035