A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

被引:1
|
作者
Freixes, Marc [1 ]
Alias, Francesc [1 ]
Claudi Socoro, Joan [1 ]
机构
[1] La Salle Univ Ramon Llull, Grup Recerca Tecnol Media GTM, Quatre Camins 30, Barcelona 08022, Spain
关键词
Text-to-speech; Unit selection; Speech synthesis; Singing synthesis; Speech-to-singing; VOICE SYNTHESIS SYSTEM; PLUS NOISE MODEL; QUALITY;
D O I
10.1186/s13636-019-0163-y
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid devices, which may also require singing. To enable a corpus-based TTS system to sing, a supplementary singing database should be recorded. This solution, however, might be too costly for eventual singing needs, or even unfeasible if the original speaker is unavailable or unable to sing properly. This work introduces a unit selection-based text-to-speech-and-singing (US-TTS&S) synthesis framework, which integrates speech-to-singing (STS) conversion to enable the generation of both speech and singing from an input text and a score, respectively, using the same neutral speech corpus. The viability of the proposal is evaluated considering three vocal ranges and two tempos on a proof-of-concept implementation using a 2.6-h Spanish neutral speech corpus. The experiments show that challenging STS transformation factors are required to sing beyond the corpus vocal range and/or with notes longer than 150 ms. While score-driven US configurations allow the reduction of pitch-scale factors, time-scale factors are not reduced due to the short length of the spoken vowels. Moreover, in the MUSHRA test, text-driven and score-driven US configurations obtain similar naturalness rates of around 40 for all the analysed scenarios. Although these naturalness scores are far from those of vocaloid, the singing scores of around 60 which were obtained validate that the framework could reasonably address eventual singing needs.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept
    Marc Freixes
    Francesc Alías
    Joan Claudi Socoró
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [2] A Dynamic Cost Weighting Framework for Unit Selection Text-to-Speech Synthesis
    Bellegarda, Jerome R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1455 - 1463
  • [3] A global, boundary-centric framework for unit selection text-to-speech synthesis
    Bellegarda, JR
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 990 - 997
  • [4] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System
    Tsiakoulis, Pirros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Raptis, Spyros
    ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 370 - 383
  • [5] Embedded Unit Selection Text-to-Speech Synthesis for Mobile Devices
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (02) : 613 - 621
  • [6] A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers
    Chalamandaris, Aimilios
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Raptis, Spyros
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (03) : 1890 - 1897
  • [7] A RESEARCH BED FOR UNIT SELECTION BASED TEXT TO SPEECH SYNTHESIS
    Sarathy, K. Partha
    Ramakrishnan, A. G.
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 229 - +
  • [8] Assessing a Speaker for Fast Speech in Unit Selection Speech Synthesis
    Moers, Donata
    Wagner, Petra
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2015 - +
  • [9] Trainable unit selection speech synthesis under statistical framework
    WANG RenHua
    Science Bulletin, 2009, (11) : 1963 - 1969
  • [10] Trainable unit selection speech synthesis under statistical framework
    Wang RenHua
    Dai LiRong
    Ling ZhenHua
    Hu Yu
    CHINESE SCIENCE BULLETIN, 2009, 54 (11): : 1963 - 1969