Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis

被引：5

作者：

Lei, Yi ^{[1
]}

Yang, Shan ^{[2
]}

Zhu, Xinfa ^{[1
]}

Xie, Lei ^{[1
]}

Su, Dan ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Xian 710129, Peoples R China

[2] Tencent AI Lab, Beijing 100086, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

基金：

国家重点研发计划;

关键词：

Timbre; Spectrogram; Perturbation methods; Generators; Speech synthesis; Adaptation models; Acoustics; Cross-speaker emotion transfer; emotional TTS; information perturbation; speech synthesis; RECOGNITION;

D O I：

10.1109/LSP.2022.3203888

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Through borrowing emotional expressions from an emotional speaker, cross-speaker emotion transfer is an effective way to produce emotional speech for target speakers without emotional training data. Since emotion and timbre of the source speaker are heavily entangled in speech, existing approaches often struggle to trade off between speaker similarity and emotional expression in the synthetic speech of the target speaker. In this letter, we propose to disentangle timbre and emotion through information perturbation to conduct cross-speaker emotion transfer, which effectively learns the emotional expression of the source speaker and maintains the timbre of the target speaker. Specifically, we separately perturb the timbre and emotion-related features (e.g., formant and pitch) of source speech to obtain and model the timbre- and emotion-independent signals, based on which the proposed model can deliver the emotional expression for target speakers. Experimental results demonstrate the proposed approach significantly outperforms the baselines in terms of naturalness and similarity, indicating the effectiveness of information perturbation for cross-speaker emotion transfer.

引用

页码：1948 / 1952

页数：5

共 50 条

[1] Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis
Li, Tao
Wang, Xinsheng
Xie, Qicong
Wang, Zhichao
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1448 - 1460
[2] METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer
Zhu, Xinfa
Lei, Yi
Li, Tao
Zhang, Yongmao
Zhou, Hongbin
Lu, Heng
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1506 - 1518
[3] Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Li, Tao
Wang, Xinsheng
Xie, Qicong
Wang, Zhichao
Jiang, Mingqi
Xie, Lei
INTERSPEECH 2022, 2022, : 5498 - 5502
[4] Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Pan, Shifeng
He, Lei
INTERSPEECH 2021, 2021, : 4678 - 4682
[5] iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre
Zhang, Guangyan
Qin, Ying
Zhang, Wenjie
Wu, Jialun
Li, Mei
Gai, Yutao
Jiang, Feijun
Lee, Tan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1693 - 1705
[6] ACCENT CONVERSION THROUGH CROSS-SPEAKER ARTICULATORY SYNTHESIS
Aryal, Sandesh
Gutierrez-Osuna, Ricardo
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis
Zaidi, Julian
Seute, Hugo
van Niekerk, Benjamin
Carbonneau, Marc-Andre
INTERSPEECH 2022, 2022, : 4591 - 4595
[8] CROSS-SPEAKER STYLE TRANSFER FOR TEXT-TO-SPEECH USING DATA AUGMENTATION
Ribeiro, Manuel Sam
Roth, Julian
Comini, Giulia
Huybrechts, Goeric
Gabrys, Adam
Lorenzo-Trueba, Jaime
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6797 - 6801
[9] Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Qiang, Chunyu
Yang, Peng
Che, Hao
Wang, Xiaorui
Wang, Zhongyuan
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 61 - 65
[10] Incorporating Cross-speaker Style Transfer for Multi-language Text-to-Speech
Shang, Zengqiang
Huang, Zhihua
Zhang, Haozhe
Zhang, Pengyuan
Yan, Yonghong
INTERSPEECH 2021, 2021, : 1619 - 1623

← 1 2 3 4 5 →