On the Role of Spectral Dynamics in Unit Selection Speech Synthesis

被引：0

作者：

Kirkpatrick, Barry ^{[1
]}

O'Brien, Darragh ^{[1
]}

Scaife, Ronan ^{[1
]}

Errity, Andrew ^{[1
]}

机构：

[1] Dublin City Univ, Fac Engn & Comp, Res Inst Networks & Commun Engn, Dublin 9, Ireland

来源：

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年

关键词：

speech synthesis; join costs; auditory perception; spectral dynamics; feature extraction;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cost functions employed in unit selection significantly influence the quality of speech output. Although unit selection can produce very natural sounding speech the quality can be inconsistent and is difficult to guarantee due to discontinuities between incompatible units. The join cost employed in unit selection to measure the suitability of concatenating speech units typically consists of sub costs representing the fundamental frequency and spectrum at the boundaries of each unit. In this study the role of spectral dynamics as a join cost in unit selection synthesis is explored. A number of spectral dynamic measures are tested for the task of detecting discontinuities. Results indicate that spectral dynamic measures correlate with human perception of discontinuity if the features are extracted appropriately. Spectral dynamic mismatch is found to be a source of discontinuity although results suggest this is likely to occur simultaneously with static spectral mismatch.

引用

页码：2029 / 2032

页数：4

共 50 条

[41] Unit-Selection Speech Synthesis Method Using Words as Search Units
Segi, Hiroyuki
INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2016, 7 (02) : 53 - 67
[42] Unit-Selection Speech Synthesis Adjustments for Audiobook-Based Voices
Vit, Jakub
Matousek, Jindrich
TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 335 - 342
[43] EXTRACTING UNIT EMBEDDINGS USING SEQUENCE-TO-SEQUENCE ACOUSTIC MODELS FOR UNIT SELECTION SPEECH SYNTHESIS
Zhou, Xiao
Ling, Zhen-Hua
Dai, Li-Rong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7659 - 7663
[44] Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis
Vepa, Jithendra
King, Simon
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1763 - 1771
[45] An efficient unit-selection method for concatenative Text-to-speech synthesis systems
Gros, Jerneja Zganec
Zganec, Mario
Journal of Computing and Information Technology, 2008, 16 (01) : 69 - 78
[46] A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers
Chalamandaris, Aimilios
Karabetsos, Sotiris
Tsiakoulis, Pirros
Raptis, Spyros
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (03) : 1890 - 1897
[47] Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models
Zhen-Hua Ling
Zhi-Ping Zhou
Journal of Signal Processing Systems, 2018, 90 : 1053 - 1062
[48] Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models
Ling, Zhen-Hua
Zhou, Zhi-Ping
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 1053 - 1062
[49] Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis
Zhou, Xiao
Ling, Zhen-Hua
Dai, Li-Rong
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (03)
[50] A classifier-based target cost for unit selection speech synthesis trained on perceptual data
Strom, Volker
King, Simon
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 150 - 153

← 1 2 3 4 5 →