A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引:0
作者
Adiga, Nagaraj [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India
来源
2014 ANNUAL IEEE INDIA CONFERENCE (INDICON) | 2014年
关键词
speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis
    Coto-Jimenez, Marvin
    Goddard-Close, John
    Martinez-Licona, Fabiola M.
    SPEECH AND COMPUTER, 2014, 8773 : 368 - 375
  • [22] High quality Arabic text-to-speech synthesis using unit selection
    Abdelmalek, Raja
    Mnasri, Zied
    2016 13TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2016, : 1 - 5
  • [23] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2022, 2022, : 461 - 465
  • [24] Expressive Text-to-Speech using Style Tag
    Kim, Minchan
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Jong Jin
    Kim, Nam Soo
    INTERSPEECH 2021, 2021, : 4663 - 4667
  • [25] Conditional Random Fields for Hierarchical Segment Selection in Text-to-Speech Synthesis
    Weiss, Christian
    Hess, Wolfgang
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2026 - 2029
  • [26] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    INTERSPEECH 2022, 2022, : 3388 - 3392
  • [27] An efficient model for text-to-speech synthesis in Indian languages
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (03) : 305 - 315
  • [28] Pre-trained Text Embeddings for Enhanced Text-to-Speech Synthesis
    Hayashi, Tomoki
    Watanabe, Shinji
    Toda, Tomoki
    Takeda, Kazuya
    Toshniwal, Shubham
    Livescu, Karen
    INTERSPEECH 2019, 2019, : 4430 - 4434
  • [29] On building phonetically and prosodically rich speech corpus for text-to-speech synthesis
    Matousek, Jindrich
    Romportl, Jan
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 442 - +
  • [30] Arabic vowel synthesis using nonlinear dynamic techniques
    Abdallah, MH
    Tarbush, S
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IV, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING, 2003, : 355 - 359