Word-level Text Markup for Prosody Control in Speech Synthesis

被引:0
作者
Korotkova, Yuliya [1 ,2 ]
Kalinovskiy, Ilya [1 ,3 ]
Vakhrusheva, Tatiana [1 ,2 ]
机构
[1] JustAI, St Petersburg, Russia
[2] Higher Sch Econ, Moscow, Russia
[3] Tomsk Polytech Univ, Sch Comp Sci & Robot, Tomsk, Russia
来源
INTERSPEECH 2024 | 2024年
关键词
prosody control; prosody tagging; word-level prosody; speech synthesis; TTS;
D O I
10.21437/Interspeech.2024-715
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern Text-to-Speech (TTS) technologies generate speech very close to the natural one, but synthesized voices still lack variation in intonation which, in addition, is hard to control. In this work, we address the problem of prosody control, aiming to capture information about intonation in a markup without hand-labeling and linguistic expertise. We propose a method of encoding prosodic knowledge from textual and acoustic modalities, which are obtained with the help of models pretrained on self-supervised tasks, into latent quantized space with interpretable features. Based on these features, the prosodic markup is constructed, and it is used as an additional input to the TTS model to solve the one-to-many problem and is predicted by text. Moreover, this method allows for prosody control during inference time and scalability to new data and other languages.
引用
收藏
页码:2280 / 2284
页数:5
相关论文
共 50 条
  • [21] Word Emphasis Prediction for Expressive Text to Speech
    Mass, Yosi
    Shechtman, Slava
    Mordechay, Moran
    Hoory, Ron
    Shalom, Oren Sar
    Lev, Guy
    Konopnicki, David
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2868 - 2872
  • [22] PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH
    Yi, Yuanhao
    He, Lei
    Pan, Shifeng
    Wang, Xi
    Xiao, Yujia
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7582 - 7586
  • [23] ROBUST AND FINE-GRAINED PROSODY CONTROL OF END-TO-END SPEECH SYNTHESIS
    Lee, Younggun
    Kim, Taesu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5911 - 5915
  • [24] Algorithms for Speech Segmentation at Syllable-Level for Text-to-Speech Synthesis System in Gujarati
    Patil, Hemant A.
    Patel, Tanvina
    Talesara, Swati
    Shah, Nirmesh
    Sailor, Hardik
    Vachhani, Bhavik
    Akhani, Janki
    Kanakiya, Bhargav
    Gaur, Yashesh
    Prajapati, Vibha
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [25] Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis
    Zhu, Jing
    Yu, Yibiao
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 549 - 552
  • [26] Expressive Prosody for Unit-selection Speech Synthesis
    Strom, Volker
    Clark, Robert
    King, Simon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
  • [27] GRAPHPB: GRAPHICAL REPRESENTATIONS OF PROSODY BOUNDARY IN SPEECH SYNTHESIS
    Sun, Aolan
    Wang, Jianzong
    Cheng, Ning
    Peng, Huayi
    Zeng, Zhen
    Kong, Lingwei
    Xiao, Jing
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 438 - 445
  • [28] Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis
    Du, Chenpeng
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 190 - 201
  • [29] DISCOURSE-LEVEL PROSODY MODELING WITH A VARIATIONAL AUTOENCODER FOR NON-AUTOREGRESSIVE EXPRESSIVE SPEECH SYNTHESIS
    Wu, Ning-Qian
    Liu, Zhao-Ci
    Ling, Zhen-Hua
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7592 - 7596
  • [30] Speech Synthesis for Bangla Text to Speech Conversion
    Arafat, Mohammad Yasir
    Fahrin, Sanjana
    Islam, Md. Jamirul
    Siddiquee, Md. Ashraf
    Khan, Afsana
    Kotwal, Mohammed Rokibul Alam
    Huda, Mohammad Nurul
    8TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA 2014), 2014,