On the Role of Spectral Dynamics in Unit Selection Speech Synthesis

被引：0

作者：

Kirkpatrick, Barry ^{[1
]}

O'Brien, Darragh ^{[1
]}

Scaife, Ronan ^{[1
]}

Errity, Andrew ^{[1
]}

机构：

[1] Dublin City Univ, Fac Engn & Comp, Res Inst Networks & Commun Engn, Dublin 9, Ireland

来源：

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年

关键词：

speech synthesis; join costs; auditory perception; spectral dynamics; feature extraction;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cost functions employed in unit selection significantly influence the quality of speech output. Although unit selection can produce very natural sounding speech the quality can be inconsistent and is difficult to guarantee due to discontinuities between incompatible units. The join cost employed in unit selection to measure the suitability of concatenating speech units typically consists of sub costs representing the fundamental frequency and spectrum at the boundaries of each unit. In this study the role of spectral dynamics as a join cost in unit selection synthesis is explored. A number of spectral dynamic measures are tested for the task of detecting discontinuities. Results indicate that spectral dynamic measures correlate with human perception of discontinuity if the features are extracted appropriately. Spectral dynamic mismatch is found to be a source of discontinuity although results suggest this is likely to occur simultaneously with static spectral mismatch.

引用

页码：2029 / 2032

页数：4

共 50 条

[31] Maximum Likelihood Unit Selection for Corpus-based Speech Synthesis
Gamboa Rosales, Abubeker
Rosales, Hamurabi Gamboa
Hoffmann, Ruediger
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 748 - +
[32] Learned dictionaries for sparse representation based unit selection speech synthesis
Sharma, Pulkit
Abrol, Vinayak
Sao, Anil Kumar
2016 TWENTY SECOND NATIONAL CONFERENCE ON COMMUNICATION (NCC), 2016,
[33] IMPROVED UNIT SELECTION SPEECH SYNTHESIS METHOD UTILIZING SUBJECTIVE EVALUATION RESULTS ON SYNTHETIC SPEECH
Xia, Xian-Jun
Ling, Zhen-Hua
Yang, Chen-Yu
Dai, Li-Rong
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 160 - 164
[34] Evaluation of Finnish Unit Selection and HMM-based Speech Synthesis
Silen, Hanna
Helander, Elina
Nurminen, Jani
Gabbouji, Moncef
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1853 - +
[35] Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
Zhou, Xiao
Ling, Zhen-Hua
Zhou, Zhi-Ping
Dai, Li-Rong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2509 - 2513
[36] A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept
Marc Freixes
Francesc Alías
Joan Claudi Socoró
EURASIP Journal on Audio, Speech, and Music Processing, 2019
[37] A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept
Freixes, Marc
Alias, Francesc
Claudi Socoro, Joan
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
[38] A method for combining intonation modelling and speech unit selection in corpus-based speech synthesis systems
Diaz, Francisco Campillo
Rodriguez Banga, Eduardo
SPEECH COMMUNICATION, 2006, 48 (08) : 941 - 956
[39] Admissible stopping in Viterbi beam search for unit selection in concatenative speech synthesis
Sakai, Shinsuke
Kawahara, Tatsuya
Nakamura, Satoshi
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4613 - 4616
[40] Multisyn: Open-domain unit selection for the Festival speech synthesis system
Clark, Robert A. J.
Richmond, Korin
King, Simon
SPEECH COMMUNICATION, 2007, 49 (04) : 317 - 330

← 1 2 3 4 5 →