Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis

被引：5

作者：

Lei, Yi ^{[1
]}

Yang, Shan ^{[2
]}

Zhu, Xinfa ^{[1
]}

Xie, Lei ^{[1
]}

Su, Dan ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Xian 710129, Peoples R China

[2] Tencent AI Lab, Beijing 100086, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

基金：

国家重点研发计划;

关键词：

Timbre; Spectrogram; Perturbation methods; Generators; Speech synthesis; Adaptation models; Acoustics; Cross-speaker emotion transfer; emotional TTS; information perturbation; speech synthesis; RECOGNITION;

D O I：

10.1109/LSP.2022.3203888

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Through borrowing emotional expressions from an emotional speaker, cross-speaker emotion transfer is an effective way to produce emotional speech for target speakers without emotional training data. Since emotion and timbre of the source speaker are heavily entangled in speech, existing approaches often struggle to trade off between speaker similarity and emotional expression in the synthetic speech of the target speaker. In this letter, we propose to disentangle timbre and emotion through information perturbation to conduct cross-speaker emotion transfer, which effectively learns the emotional expression of the source speaker and maintains the timbre of the target speaker. Specifically, we separately perturb the timbre and emotion-related features (e.g., formant and pitch) of source speech to obtain and model the timbre- and emotion-independent signals, based on which the proposed model can deliver the emotional expression for target speakers. Experimental results demonstrate the proposed approach significantly outperforms the baselines in terms of naturalness and similarity, indicating the effectiveness of information perturbation for cross-speaker emotion transfer.

引用

页码：1948 / 1952

页数：5

共 50 条

[31] Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
Guo, Lili
Wang, Longbiao
Dang, Jianwu
Liu, Zhilei
Guan, Haotian
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 14 - 25
[32] Emotional Speech Synthesis for Multi-Speaker Emotional Dataset Using WaveNet Vocoder
Choi, Heejin
Park, Sangjun
Park, Jinuk
Hahn, Minsoo
2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2019,
[33] An emotional speech synthesis markup language processor for multi-speaker and emotional text-to-speech applications
Ryu, Se-Hui
Cho, Hee
Lee, Ju-Hyun
Hong, Ki-Hyung
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 523 - 529
[34] Enhancing Speech Emotion Recognition Using Transfer Learning from Speaker Embeddings
Jakubec, Maros
Jarina, Roman
Lieskovska, Eva
Kasak, Peter
Spisiak, Michal
TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 184 - 195
[35] Speaker-dependent model interpolation for statistical emotional speech synthesis
Chih-Yu Hsu
Chia-Ping Chen
EURASIP Journal on Audio, Speech, and Music Processing, 2012
[36] Speaker-dependent model interpolation for statistical emotional speech synthesis
Hsu, Chih-Yu
Chen, Chia-Ping
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012, : 1 - 10
[37] Emotional feature extraction based on phoneme information for speech emotion recognition
Hyun, Kyang Hak
Kim, Eun Ho
Kwak, Yoon Keun
2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 797 - +
[38] ON-LINE SPEAKER ADAPTATION BASED EMOTION RECOGNITION USING INCREMENTAL EMOTIONAL INFORMATION
Kim, Jae-Bok
Park, Jeong-Sik
Oh, Yung-Hwan
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4948 - 4951
[39] Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information
Ta, Bao Thang
Nguyen, Tung Lam
Dang, Dinh Son
Le, Nhat Minh
Do, Van Hai
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1596 - 1601
[40] ED-TTS: MULTI-SCALE EMOTION MODELING USING CROSS-DOMAIN EMOTION DIARIZATION FOR EMOTIONAL SPEECH SYNTHESIS
Tang, Haobin
Zhang, Xulong
Cheng, Ning
Xiao, Jing
Wang, Jianzong
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 12146 - 12150

← 1 2 3 4 5 →