CONTROLLING EMOTION STRENGTH WITH RELATIVE ATTRIBUTE FOR END-TO-END SPEECH SYNTHESIS

被引:0
作者
Zhu, Xiaolian [1 ,2 ]
Yang, Shan [1 ]
Yang, Geng [1 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Hebei Univ Econ & Business, Publ Comp Educ Ctr, Shijiazhuang, Hebei, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Emotion strength; relative attributes; speech synthesis; text-to-speech; end-to-end;
D O I
10.1109/asru46091.2019.9003829
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, attention-based end-to-end speech synthesis has achieved superior performance compared to traditional speech synthesis models, and several approaches like global style tokens are proposed to explore the style controllability of the end-to-end model. Although the existing methods show good performance in style disentanglement and transfer, it is still unable to control the explicit emotion of generated speech. In this paper, we mainly focus on the subtle control of expressive speech synthesis, where the emotion category and strength can be easily controlled with a discrete emotional vector and a continuous simple scalar, respectively. The continuous strength controller is learned by a ranking function according to the relative attribute measured on an emotion dataset. Our method automatically learns the relationship between low-level acoustic features and high-level subtle emotion strength. Experiments show that our method can effectively improve the controllability for an expressive end-to-end model.
引用
收藏
页码:192 / 199
页数:8
相关论文
共 50 条
  • [1] Controllable Emotion Transfer For End-to-End Speech Synthesis
    Li, Tao
    Yang, Shan
    Xue, Liumeng
    Xie, Lei
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [2] Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis
    Li, Tao
    Wang, Xinsheng
    Xie, Qicong
    Wang, Zhichao
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1448 - 1460
  • [3] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [4] Analysis of Pronunciation Learning in End-to-End Speech Synthesis
    Taylor, Jason
    Richmond, Korin
    INTERSPEECH 2019, 2019, : 2070 - 2074
  • [5] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [6] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription
    Lin, Yuqin
    Wang, Longbiao
    Li, Sheng
    Dang, Jianwu
    Ding, Chenchen
    INTERSPEECH 2020, 2020, : 4791 - 4795
  • [7] NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality
    Tan, Xu
    Chen, Jiawei
    Liu, Haohe
    Cong, Jian
    Zhang, Chen
    Liu, Yanqing
    Wang, Xi
    Leng, Yichong
    Yi, Yuanhao
    He, Lei
    Zhao, Sheng
    Qin, Tao
    Soong, Frank
    Liu, Tie-Yan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4234 - 4245
  • [8] Emotion selectable end-to-end text-based speech editing
    Wang, Tao
    Yi, Jiangyan
    Fu, Ruibo
    Tao, Jianhua
    Wen, Zhengqi
    Zhang, Chu Yuan
    ARTIFICIAL INTELLIGENCE, 2024, 329
  • [9] EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion
    Miao, Chenfeng
    Zhu, Qingying
    Chen, Minchuan
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1650 - 1661
  • [10] Myanmar Text-to-Speech Synthesis Using End-to-End Model
    Qin, Qinglai
    Yang, Jian
    Li, Peiying
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 6 - 11