CONTROLLING EMOTION STRENGTH WITH RELATIVE ATTRIBUTE FOR END-TO-END SPEECH SYNTHESIS

被引:0
作者
Zhu, Xiaolian [1 ,2 ]
Yang, Shan [1 ]
Yang, Geng [1 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Hebei Univ Econ & Business, Publ Comp Educ Ctr, Shijiazhuang, Hebei, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Emotion strength; relative attributes; speech synthesis; text-to-speech; end-to-end;
D O I
10.1109/asru46091.2019.9003829
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, attention-based end-to-end speech synthesis has achieved superior performance compared to traditional speech synthesis models, and several approaches like global style tokens are proposed to explore the style controllability of the end-to-end model. Although the existing methods show good performance in style disentanglement and transfer, it is still unable to control the explicit emotion of generated speech. In this paper, we mainly focus on the subtle control of expressive speech synthesis, where the emotion category and strength can be easily controlled with a discrete emotional vector and a continuous simple scalar, respectively. The continuous strength controller is learned by a ranking function according to the relative attribute measured on an emotion dataset. Our method automatically learns the relationship between low-level acoustic features and high-level subtle emotion strength. Experiments show that our method can effectively improve the controllability for an expressive end-to-end model.
引用
收藏
页码:192 / 199
页数:8
相关论文
共 50 条
  • [21] End-to-End Mongolian Text-to-Speech System
    Li, Jingdong
    Zhang, Hui
    Liu, Rui
    Zhang, Xueliang
    Bao, Feilong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 483 - 487
  • [22] An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis
    Kwon, Ohsung
    Jang, Inseon
    Ahn, ChungHyun
    Kang, Hong-Goo
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (09) : 1383 - 1387
  • [23] End-to-End Text-To-Speech synthesis for under resourced South African languages
    Nthite, Thapelo
    Tsoeu, Mohohlo
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 684 - 689
  • [24] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 2140 - 2144
  • [25] Lhasa-Tibetan Speech Synthesis Using End-to-End Model
    Zhao, Yue
    Hu, Panhua
    Xu, Xiaona
    Wu, Licheng
    Li, Xiali
    IEEE ACCESS, 2019, 7 (140305-140311) : 140305 - 140311
  • [26] Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
    Yang, Fengyu
    Yang, Shan
    Wu, Qinghua
    Wang, Yujun
    Xie, Lei
    INTERSPEECH 2020, 2020, : 3436 - 3440
  • [27] EXPLORING END-TO-END NEURAL TEXT-TO-SPEECH SYNTHESIS FOR ROMANIAN
    Dumitrache, Marius
    Rebedea, Traian
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR NATURAL LANGUAGE PROCESSING, 2020, : 93 - 102
  • [28] End-to-End Speech Emotion Recognition Based on One-Dimensional Convolutional Neural Network
    Gao, Mengna
    Dong, Jing
    Zhou, Dongsheng
    Zhang, Qiang
    Yang, Deyun
    3RD INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2019), 2019, : 78 - 82
  • [29] IMPROVING MANDARIN END-TO-END SPEECH SYNTHESIS BY SELF-ATTENTION AND LEARNABLE GAUSSIAN BIAS
    Yang, Fengyu
    Yang, Shan
    Zhu, Pengcheng
    Yan, Pengju
    Xie, Lei
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 208 - 213
  • [30] End-to-end text-to-speech synthesis with unaligned multiple language units based on attention
    Aso, Masashi
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    INTERSPEECH 2020, 2020, : 4009 - 4013