Emotional Voice Conversion with Semi-Supervised Generative Modeling

被引:0
作者
Zhu, Hai [1 ]
Zhan, Huayi [1 ]
Cheng, Hong [2 ]
Wu, Ying [3 ]
机构
[1] Sichuan Changhong Elect Holding Grp Co Ltd, Changhong AI Lab CHAIR, Mianyang, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Ctr Robot, Chengdu, Sichuan, Peoples R China
[3] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL 60208 USA
来源
INTERSPEECH 2023 | 2023年
关键词
emotional voice conversion; variational autoencoder; semi-supervised; end-to-end; PROSODY;
D O I
10.21437/Interspeech.2023-251
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotional Voice Conversion (EVC) is a task that aims to convert the emotional state of speech from one to another while preserving the linguistic information and identity of the speaker. However, many studies are limited by the requirement for parallel speech data between different emotional patterns, which is not widely available in real-life applications. Furthermore, the annotation of emotional data is highly time-consuming and labor-intensive. To address these problems, in this paper, we propose SGEVC, a novel semi-supervised generative model for emotional voice conversion. This paper demonstrates that using as little as 1% supervised data is sufficient to achieve EVC. Experimental results show that our proposed model achieves state-of-the-art (SOTA) performance and consistently outperforms EVC baseline frameworks.
引用
收藏
页码:2278 / 2282
页数:5
相关论文
共 25 条
  • [1] Aihara R., 2012, American Journal of Signal Processing, V2, P134
  • [2] Chen M., INT C LEARN REPR
  • [3] Chen X., 2022, IEEE T MULTIMEDIA
  • [4] A Survey of Using Vocal Prosody to Convey Emotion in Robot Speech
    Crumpton, Joe
    Bethel, Cindy L.
    [J]. INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2016, 8 (02) : 271 - 285
  • [5] Elgaar M, 2020, INT CONF ACOUST SPEE, P7769, DOI [10.1109/ICASSP40776.2020.9054534, 10.1109/icassp40776.2020.9054534]
  • [6] Habib R., 2019, INT C LEARN REPR
  • [7] Kim J., 2021, PMLR, P5530
  • [8] Kim TH, 2020, INT CONF ACOUST SPEE, P7774, DOI [10.1109/icassp40776.2020.9053255, 10.1109/ICASSP40776.2020.9053255]
  • [9] King DB, 2015, ACS SYM SER, V1214, P1
  • [10] Kingma Diederik P., 2016, Advances in Neural Information Processing Systems, V29