LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning

被引：0

作者：

Udupa, Sathvik ^{[1
]}

Bandekar, Jesuraja ^{[1
]}

Singh, Abhayjeet ^{[1
]}

Deekshitha, G. ^{[1
]}

Kumar, Saurabh ^{[1
]}

Badiger, Sandhya ^{[1
]}

Nagireddi, Amala ^{[1
]}

Roopa, R. ^{[1
]}

Ghosh, Prasanta Kumar ^{[1
]}

Murthy, Hema A. ^{[2
]}

Kumar, Pranaw ^{[3
]}

Tokuda, Keiichi ^{[4
]}

Hasegawa-Johnson, Mark ^{[5
]}

Olbrich, Philipp ^{[6
]}

机构：

[1] Indian Inst Sci IISc, Elect Engn Dept, Bangalore 560012, India

[2] Indian Inst Technol, Dept Comp Sci & Engn, Chennai 600036, India

[3] CDAC, Mumbai 400049, India

[4] Nagoya Inst Technol, Dept Comp Sci, Nagoya 4668555, Japan

[5] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61820 USA

[6] Deutsch Gesell Internatl Zusammenarbeit GIZ GmbH, D-53113 Bonn, Germany

来源：

IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2025年 / 6卷

关键词：

Cloning; Multilingual; Signal processing; Training; Text to speech; Noise measurement; Vocabulary; Solid modeling; Manuals; Encoding; Speech synthesis; multi-speaker; multi-lingual TTS; voice cloning; cross-lingual synthesis; SPEECH SYNTHESIS;

D O I：

10.1109/OJSP.2025.3531782

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the multi-speaker, multi-lingual Text-to-Speech (TTS) model. Towards this, 80 hours of TTS data has been released in each of Bengali, Chhattisgarhi, English (Indian), and Kannada languages. This is in addition to Telugu, Hindi, and Marathi data released during the LIMMITS'23 challenge. The challenge encourages the advancement of TTS in Indian Languages as well as the development of multi-speaker voice cloning techniques for TTS. The three tracks of LIMMITS'24 have provided an opportunity for various researchers and practitioners around the world to explore the state of the art in research for voice cloning with TTS.

引用

页码：293 / 302

页数：10

共 50 条

[41] Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis
Kumar, Neeraj
Narang, Ankur
Lall, Brejesh
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1679 - 1693
[42] Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios
Xie, Qicong
Li, Tao
Wang, Xinsheng
Wang, Zhichao
Xie, Lei
Yu, Guoqiao
Wan, Guanglu
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 66 - 70
[43] ANN for Multi-lingual Regional Web Communication
Prakash, Kolla Bhanu
Rangaswamy, M. A. Dorai
Raman, Arun Raja
NEURAL INFORMATION PROCESSING, ICONIP 2012, PT V, 2012, 7667 : 473 - 478
[44] PHONEME DEPENDENT SPEAKER EMBEDDING AND MODEL FACTORIZATION FOR MULTI-SPEAKER SPEECH SYNTHESIS AND ADAPTATION
Fu, Ruibo
Tao, Jianhua
Wen, Zhengqi
Zheng, Yibin
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6930 - 6934
[45] Neural Networks for Multi-lingual Multi-label Document Classification
Martinek, Jiri
Lenc, Ladislav
Kral, Pavel
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 73 - 83
[46] DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
Lee, Junmo
Song, Kwangsub
Noh, Kyoungjin
Park, Tae-Jun
Chang, Joon-Hyuk
2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 61 - 64
[47] Normalization Driven Zero-shot Multi-Speaker Speech Synthesis
Kumar, Neeraj
Goel, Srishti
Narang, Ankur
Lall, Brejesh
INTERSPEECH 2021, 2021, : 1354 - 1358
[48] Effective Electrical Safety Program Training in Multi-Lingual/Cultural Environments
Kovacic, Michael
Cunningham, Karl
IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2019, 55 (04) : 4384 - 4388
[49] Comparison of real-time multi-speaker neural vocoders on CPUs
Matsubara, Keisuke
Okamoto, Takuma
Takashima, Ryoichi
Takiguchi, Tetsuya
Toda, Tomoki
Kawai, Hisashi
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2022, 43 (02) : 121 - 124
[50] U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning
Li, Tao
Wang, Zhichao
Zhu, Xinfa
Cong, Jian
Tian, Qiao
Wang, Yuping
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4026 - 4035

← 1 2 3 4 5 →