Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques

被引：32

作者：

Turk, Oytun ^{[1
]}

Schroeder, Marc ^{[2
]}

机构：

[1] Sensory Inc, Portland, OR 97209 USA

[2] DFKI GmbH Language Technol Lab, Speech Grp, D-66123 Saarbrucken, Germany

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 05期

关键词：

Expressive speech synthesis; prosody; voice conversion; voice quality transformation;

D O I：

10.1109/TASL.2010.2041113

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Generating expressive synthetic voices requires carefully designed databases that contain sufficient amount of expressive speech material. This paper investigates voice conversion and modification techniques to reduce database collection and processing efforts while maintaining acceptable quality and naturalness. In a factorial design, we study the relative contributions of voice quality and prosody as well as the amount of distortions introduced by the respective signal manipulation steps. The unit selection engine in our open source and modular text-to-speech (TTS) framework MARY is extended with voice quality transformation using either GMM-based prediction or vocal tract copy resynthesis. These algorithms are then cross-combined with various prosody copy resynthesis methods. The overall expressive speech generation process functions as a postprocessing step on TTS outputs to transform neutral synthetic speech into aggressive, cheerful, or depressed speech. Cross-combinations of voice quality and prosody transformation algorithms are compared in listening tests for perceived expressive style and quality. The results show that there is a tradeoff between identification and naturalness. Combined modeling of both voice quality and prosody leads to the best identification scores at the expense of lowest naturalness ratings. The fine detail of both voice quality and prosody, as preserved by the copy synthesis, did contribute to a better identification as compared to the approximate models.

引用

页码：965 / 973

页数：9

共 50 条

[41] Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Liu, Zhonghua
Wang, Shijun
Chen, Ning
INTERSPEECH 2023, 2023, : 2298 - 2302
[42] EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion
Miao, Chenfeng
Zhu, Qingying
Chen, Minchuan
Ma, Jun
Wang, Shaojun
Xiao, Jing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1650 - 1661
[43] Expressive Prosody for Unit-selection Speech Synthesis
Strom, Volker
Clark, Robert
King, Simon
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
[44] Voice Conversion Using Speech-to-Speech Neuro-Style Transfer
AlBadawy, Ehab A.
Lyu, Siwei
INTERSPEECH 2020, 2020, : 4726 - 4730
[45] Performance Evaluation for Voice Conversion Systems
Ganchev, Todor
Lazaridis, Alexandros
Mporas, Iosif
Fakotakis, Nikos
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 317 - 324
[46] The role of prosody and voice quality in indirect storytelling speech: Annotation methodology and expressive categories
Montano, Raul
Alias, Francesc
SPEECH COMMUNICATION, 2016, 85 : 8 - 18
[47] Improvement of time alignment of the speech signals to be used in voice conversion
Mozaffari, Fatemeh
Sayadian, Abolghasem
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (01) : 79 - 84
[48] Modeling the Acoustic Correlates of Expressive Elements in Text Genres for Expressive Text-to-Speech Synthesis
Yang, Hongwu
Meng, Helen M.
Cai, Lianhong
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1806 - 1809
[49] Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion
Chien, Yung-Lun
Chen, Hsin-Hao
Yen, Ming-Chi
Tsai, Shu-Wei
Wang, Hsin-Min
Tsao, Yu
Chi, Tai-Shih
INTERSPEECH 2023, 2023, : 5023 - 5026
[50] TEXT-INFORMED SPEECH INPAINTING VIA VOICE CONVERSION
Prablanc, Pierre
Ozerov, Alexey
Duong, Ngoc Q. K.
Perez, Patrick
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 878 - 882

← 1 2 3 4 5 →