Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis

被引:1
作者
Peng, Yukun [1 ]
Ling, Zhenhua [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
国家重点研发计划;
关键词
text-to-speech; speech synthesis; multilingual; meta-learning;
D O I
10.21437/Interspeech.2022-831
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a method of decoupled pronunciation and prosody modeling to improve the performance of meta-learning-based multilingual speech synthesis. The baseline meta-learning synthesis method adopts a single text encoder with a parameter generator conditioned on language embeddings and a single decoder to predict mel-spectrograms for all languages. In contrast, our proposed method designs a two-stream model structure that contains two encoders and two decoders for pronunciation and prosody modeling, respectively, considering that the pronunciation knowledge and the prosody knowledge should be shared in different ways among languages. In our experiments, our proposed method effectively improved the intelligibility and naturalness of multilingual speech synthesis comparing with the baseline meta-learning synthesis method.
引用
收藏
页码:4257 / 4261
页数:5
相关论文
共 50 条
  • [31] Meta-Learning-Based Incremental Nonlinear Dynamic Inversion Control for Quadrotors with Disturbances
    Zhang, Xinyue
    Ran, Maopeng
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [32] META-learning-based retinal pathology classification from optical coherence tomography images
    Yin, Ziting
    Chen, Xinjian
    Zhu, Weifang
    Xiang, Dehui
    Peng, Qing
    Shi, Fei
    [J]. MEDICAL IMAGING 2023, 2023, 12464
  • [33] Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis
    Windmann, Andreas
    Jauk, Igor
    Tamburini, Fabio
    Wagner, Petra
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 332 - +
  • [34] Introduction to Multilingual Corpus-Based Concatenative Speech Synthesis
    Deprez, Filip
    Odijk, Jan
    De Moortel, Jan
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 357 - 360
  • [35] A Pronunciation Rule-Based Speech Synthesis Technique for Odia Numerals
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 : 483 - 491
  • [36] Fundamental frequency modeling for speech synthesis based on a statistical learning technique
    Sakai, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 489 - 495
  • [37] Meta-learning-based multi-objective PSO model for dynamic scheduling optimization
    Lv, Zheng
    Liao, Zherun
    Liu, Ying
    Zhao, Jun
    [J]. ENERGY REPORTS, 2023, 9 : 1227 - 1236
  • [38] DeepSniffer: A meta-learning-based chemiresistive odor sensor for recognition and classification of aroma oils
    Liu, Chuanjun
    Miyauchi, Hitoshi
    Hayashi, Kenshi
    [J]. SENSORS AND ACTUATORS B-CHEMICAL, 2022, 351
  • [39] Meta-Learning-Based Semi-Supervised Change Detection in Remote Sensing Images
    Tang, Yi
    Zhang, Liyi
    Zhang, Wuxia
    Jiang, Zuo
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [40] Signal processing: New stochastic feature of unvoiced pronunciation for whisper speech modeling and synthesis
    Zhuang X.D.
    Zhu H.
    Mastorakis N.E.
    [J]. International Journal of Circuits, Systems and Signal Processing, 2020, 14 : 1162 - 1175