Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis

被引:5
|
作者
Lei, Yi [1 ]
Yang, Shan [2 ]
Zhu, Xinfa [1 ]
Xie, Lei [1 ]
Su, Dan [2 ]
机构
[1] Northwestern Polytech Univ, Xian 710129, Peoples R China
[2] Tencent AI Lab, Beijing 100086, Peoples R China
基金
国家重点研发计划;
关键词
Timbre; Spectrogram; Perturbation methods; Generators; Speech synthesis; Adaptation models; Acoustics; Cross-speaker emotion transfer; emotional TTS; information perturbation; speech synthesis; RECOGNITION;
D O I
10.1109/LSP.2022.3203888
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Through borrowing emotional expressions from an emotional speaker, cross-speaker emotion transfer is an effective way to produce emotional speech for target speakers without emotional training data. Since emotion and timbre of the source speaker are heavily entangled in speech, existing approaches often struggle to trade off between speaker similarity and emotional expression in the synthetic speech of the target speaker. In this letter, we propose to disentangle timbre and emotion through information perturbation to conduct cross-speaker emotion transfer, which effectively learns the emotional expression of the source speaker and maintains the timbre of the target speaker. Specifically, we separately perturb the timbre and emotion-related features (e.g., formant and pitch) of source speech to obtain and model the timbre- and emotion-independent signals, based on which the proposed model can deliver the emotional expression for target speakers. Experimental results demonstrate the proposed approach significantly outperforms the baselines in terms of naturalness and similarity, indicating the effectiveness of information perturbation for cross-speaker emotion transfer.
引用
收藏
页码:1948 / 1952
页数:5
相关论文
共 50 条
  • [21] CROSS-SPEAKER SILENT-SPEECH COMMAND WORD RECOGNITION USING ELECTRO-OPTICAL STOMATOGRAPHY
    Stone, Simon
    Birkholz, Peter
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7849 - 7853
  • [22] FINE-GRAINED EMOTION STRENGTH TRANSFER, CONTROL AND PREDICTION FOR EMOTIONAL SPEECH SYNTHESIS
    Lei, Yi
    Yang, Shan
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 423 - 430
  • [23] MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis
    Lei, Yi
    Yang, Shan
    Wang, Xinsheng
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 853 - 864
  • [24] Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech
    Shaheen, Zein
    Sadekova, Tasnima
    Matveeva, Yulia
    Shirshova, Alexandra
    Kudinov, Mikhail
    INTERSPEECH 2023, 2023, : 2038 - 2042
  • [25] Speaker and gender dependencies in within/cross linguistic Speech Emotion Recognition
    Chakhtouna A.
    Sekkate S.
    Adib A.
    International Journal of Speech Technology, 2023, 26 (03) : 609 - 625
  • [26] A Method for Emotional Speech Synthesis Based on Speaker Adaptive Training
    Lu, Xiaoyong
    Li, Yanqin
    Yang, Hongwu
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 31 - 35
  • [27] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [28] Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
    Zhang, Guangyan
    Qiu, Shirong
    Qin, Ying
    Lee, Tan
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [29] Speaker Dependent, Speaker Independent and Cross Language Emotion Recognition From Speech Using GMM and HMM
    Bhaykar, Manav
    Yadav, Jainath
    Rao, K. Sreenivasa
    2013 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2013,
  • [30] SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers
    Arezzo, Alessandro
    Berretti, Stefano
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,