On the Role of Spectral Dynamics in Unit Selection Speech Synthesis

被引:0
|
作者
Kirkpatrick, Barry [1 ]
O'Brien, Darragh [1 ]
Scaife, Ronan [1 ]
Errity, Andrew [1 ]
机构
[1] Dublin City Univ, Fac Engn & Comp, Res Inst Networks & Commun Engn, Dublin 9, Ireland
来源
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年
关键词
speech synthesis; join costs; auditory perception; spectral dynamics; feature extraction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cost functions employed in unit selection significantly influence the quality of speech output. Although unit selection can produce very natural sounding speech the quality can be inconsistent and is difficult to guarantee due to discontinuities between incompatible units. The join cost employed in unit selection to measure the suitability of concatenating speech units typically consists of sub costs representing the fundamental frequency and spectrum at the boundaries of each unit. In this study the role of spectral dynamics as a join cost in unit selection synthesis is explored. A number of spectral dynamic measures are tested for the task of detecting discontinuities. Results indicate that spectral dynamic measures correlate with human perception of discontinuity if the features are extracted appropriately. Spectral dynamic mismatch is found to be a source of discontinuity although results suggest this is likely to occur simultaneously with static spectral mismatch.
引用
收藏
页码:2029 / 2032
页数:4
相关论文
共 50 条
  • [31] Maximum Likelihood Unit Selection for Corpus-based Speech Synthesis
    Gamboa Rosales, Abubeker
    Rosales, Hamurabi Gamboa
    Hoffmann, Ruediger
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 748 - +
  • [32] Learned dictionaries for sparse representation based unit selection speech synthesis
    Sharma, Pulkit
    Abrol, Vinayak
    Sao, Anil Kumar
    2016 TWENTY SECOND NATIONAL CONFERENCE ON COMMUNICATION (NCC), 2016,
  • [33] IMPROVED UNIT SELECTION SPEECH SYNTHESIS METHOD UTILIZING SUBJECTIVE EVALUATION RESULTS ON SYNTHETIC SPEECH
    Xia, Xian-Jun
    Ling, Zhen-Hua
    Yang, Chen-Yu
    Dai, Li-Rong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 160 - 164
  • [34] Evaluation of Finnish Unit Selection and HMM-based Speech Synthesis
    Silen, Hanna
    Helander, Elina
    Nurminen, Jani
    Gabbouji, Moncef
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1853 - +
  • [35] Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
    Zhou, Xiao
    Ling, Zhen-Hua
    Zhou, Zhi-Ping
    Dai, Li-Rong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2509 - 2513
  • [36] A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept
    Marc Freixes
    Francesc Alías
    Joan Claudi Socoró
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [37] A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept
    Freixes, Marc
    Alias, Francesc
    Claudi Socoro, Joan
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [38] A method for combining intonation modelling and speech unit selection in corpus-based speech synthesis systems
    Diaz, Francisco Campillo
    Rodriguez Banga, Eduardo
    SPEECH COMMUNICATION, 2006, 48 (08) : 941 - 956
  • [39] Admissible stopping in Viterbi beam search for unit selection in concatenative speech synthesis
    Sakai, Shinsuke
    Kawahara, Tatsuya
    Nakamura, Satoshi
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4613 - 4616
  • [40] Multisyn: Open-domain unit selection for the Festival speech synthesis system
    Clark, Robert A. J.
    Richmond, Korin
    King, Simon
    SPEECH COMMUNICATION, 2007, 49 (04) : 317 - 330