Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis

被引：1

作者：

Peng, Yukun ^{[1
]}

Ling, Zhenhua ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

基金：

国家重点研发计划;

关键词：

text-to-speech; speech synthesis; multilingual; meta-learning;

D O I：

10.21437/Interspeech.2022-831

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a method of decoupled pronunciation and prosody modeling to improve the performance of meta-learning-based multilingual speech synthesis. The baseline meta-learning synthesis method adopts a single text encoder with a parameter generator conditioned on language embeddings and a single decoder to predict mel-spectrograms for all languages. In contrast, our proposed method designs a two-stream model structure that contains two encoders and two decoders for pronunciation and prosody modeling, respectively, considering that the pronunciation knowledge and the prosody knowledge should be shared in different ways among languages. In our experiments, our proposed method effectively improved the intelligibility and naturalness of multilingual speech synthesis comparing with the baseline meta-learning synthesis method.

引用

页码：4257 / 4261

页数：5

共 50 条

[31] Meta-Learning-Based Incremental Nonlinear Dynamic Inversion Control for Quadrotors with Disturbances
Zhang, Xinyue
Ran, Maopeng
[J]. APPLIED SCIENCES-BASEL, 2023, 13 (21):
[32] META-learning-based retinal pathology classification from optical coherence tomography images
Yin, Ziting
Chen, Xinjian
Zhu, Weifang
Xiang, Dehui
Peng, Qing
Shi, Fei
[J]. MEDICAL IMAGING 2023, 2023, 12464
[33] Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis
Windmann, Andreas
Jauk, Igor
Tamburini, Fabio
Wagner, Petra
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 332 - +
[34] Introduction to Multilingual Corpus-Based Concatenative Speech Synthesis
Deprez, Filip
Odijk, Jan
De Moortel, Jan
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 357 - 360
[35] A Pronunciation Rule-Based Speech Synthesis Technique for Odia Numerals
Panda, Soumya Priyadarsini
Nayak, Ajit Kumar
[J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 : 483 - 491
[36] Fundamental frequency modeling for speech synthesis based on a statistical learning technique
Sakai, S
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 489 - 495
[37] Meta-learning-based multi-objective PSO model for dynamic scheduling optimization
Lv, Zheng
Liao, Zherun
Liu, Ying
Zhao, Jun
[J]. ENERGY REPORTS, 2023, 9 : 1227 - 1236
[38] DeepSniffer: A meta-learning-based chemiresistive odor sensor for recognition and classification of aroma oils
Liu, Chuanjun
Miyauchi, Hitoshi
Hayashi, Kenshi
[J]. SENSORS AND ACTUATORS B-CHEMICAL, 2022, 351
[39] Meta-Learning-Based Semi-Supervised Change Detection in Remote Sensing Images
Tang, Yi
Zhang, Liyi
Zhang, Wuxia
Jiang, Zuo
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[40] Signal processing: New stochastic feature of unvoiced pronunciation for whisper speech modeling and synthesis
Zhuang X.D.
Zhu H.
Mastorakis N.E.
[J]. International Journal of Circuits, Systems and Signal Processing, 2020, 14 : 1162 - 1175

← 1 2 3 4 5 →