A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

被引:1
|
作者
Freixes, Marc [1 ]
Alias, Francesc [1 ]
Claudi Socoro, Joan [1 ]
机构
[1] La Salle Univ Ramon Llull, Grup Recerca Tecnol Media GTM, Quatre Camins 30, Barcelona 08022, Spain
关键词
Text-to-speech; Unit selection; Speech synthesis; Singing synthesis; Speech-to-singing; VOICE SYNTHESIS SYSTEM; PLUS NOISE MODEL; QUALITY;
D O I
10.1186/s13636-019-0163-y
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid devices, which may also require singing. To enable a corpus-based TTS system to sing, a supplementary singing database should be recorded. This solution, however, might be too costly for eventual singing needs, or even unfeasible if the original speaker is unavailable or unable to sing properly. This work introduces a unit selection-based text-to-speech-and-singing (US-TTS&S) synthesis framework, which integrates speech-to-singing (STS) conversion to enable the generation of both speech and singing from an input text and a score, respectively, using the same neutral speech corpus. The viability of the proposal is evaluated considering three vocal ranges and two tempos on a proof-of-concept implementation using a 2.6-h Spanish neutral speech corpus. The experiments show that challenging STS transformation factors are required to sing beyond the corpus vocal range and/or with notes longer than 150 ms. While score-driven US configurations allow the reduction of pitch-scale factors, time-scale factors are not reduced due to the short length of the spoken vowels. Moreover, in the MUSHRA test, text-driven and score-driven US configurations obtain similar naturalness rates of around 40 for all the analysed scenarios. Although these naturalness scores are far from those of vocaloid, the singing scores of around 60 which were obtained validate that the framework could reasonably address eventual singing needs.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] OPTIMIZATION OF COST FUNCTION WEIGHTS FOR UNIT SELECTION SPEECH SYNTHESIS USING SPEECH RECOGNITION
    Pobar, Miran
    Martincic-Ipsic, Sanda
    Ipsic, Ivo
    NEURAL NETWORK WORLD, 2012, 22 (05) : 429 - 441
  • [22] Unit selection based speech synthesis for converting short text message into voice message in mobile phones
    Bharthi, B.
    Kavitha, S.
    Kotwal, Nekshan Percy
    Parasaram, Nivedita
    Piriyanga, J.
    2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2017,
  • [23] An efficient unit-selection method for concatenative Text-to-speech synthesis systems
    Gros, Jerneja Zganec
    Zganec, Mario
    Journal of Computing and Information Technology, 2008, 16 (01) : 69 - 78
  • [24] PERCEPTUAL CLUSTERING BASED UNIT SELECTION OPTIMIZATION FOR CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS
    Jiang, Tao
    Wu, Zhiyong
    Jia, Jia
    Cai, Lianhong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 64 - 68
  • [25] Unit Selection Model in Arabic Speech Synthesis
    Al-Saiyd, Nedhal A.
    Hijjawi, Mohammad
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (04): : 126 - 131
  • [26] IMPROVED UNIT SELECTION SPEECH SYNTHESIS METHOD UTILIZING SUBJECTIVE EVALUATION RESULTS ON SYNTHETIC SPEECH
    Xia, Xian-Jun
    Ling, Zhen-Hua
    Yang, Chen-Yu
    Dai, Li-Rong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 160 - 164
  • [27] Unit-centric feature mapping for inventory pruning in unit selection text-to-speech synthesis
    Bellegarda, Jerome R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 74 - 82
  • [28] TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING
    Renovalles, Edsel Jedd
    Lucas, Crisron Rudolf
    de Leon, Franz
    Aquino, Angelina
    Jalandoni, Izza
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 212 - 217
  • [29] Unifying Unit Selection and Hidden Markov Model Speech Synthesis
    Taylor, Paul
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1758 - 1761
  • [30] Phone-Level Embeddings for Unit Selection Speech Synthesis
    Perquin, Antoine
    Lecorve, Gwenole
    Lolive, Damien
    Amsaleg, Laurent
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 21 - 31