A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引：0

作者：

Adiga, Nagaraj ^{[1
]}

Prasanna, S. R. Mahadeva ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India

来源：

2014 ANNUAL IEEE INDIA CONFERENCE (INDICON) | 2014年

关键词：

speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.

引用

页数：5

共 50 条

[1] SIGNIFICANCE OF VOWEL EPENTHESIS IN TELUGU TEXT-TO-SPEECH SYNTHESIS
Peddinti, Vijayaditya
Prahallad, Kishore
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5348 - 5351
[2] A hybrid model for text-to-speech synthesis
Violaro, F
Boeffard, O
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 426 - 434
[3] Emotion recognition from spontaneous speech using emotional vowel-like regions
Fahad, Md Shah
Singh, Shreya
Abhinav
Ranjan, Ashish
Deepak, Akshay
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 14025 - 14043
[4] Towards a Vowel Formant Based Quality Metric for Text-to-Speech Systems: Measuring Monophthong Naturalness
Albrecht, Sven
Tamboli, Rewa
Taubert, Stefan
Eibl, Maximilian
Diaeresis, Gunter
Schmied, Josef
2022 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (IEEE CIVEMSA 2022), 2022,
[5] Facial Expression Synthesis Using Vowel Recognition for Synthesized Speech
Asada, Taro
Adachi, Ruka
Takada, Syuhei
Yoshitomi, Yasunari
Tabuse, Masayoshi
PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB2020), 2020, : 398 - 401
[6] Design of a Yoruba Language Speech Corpus for the Purposes of Text-to-Speech (TTS) Synthesis
Dagba, Theophile K.
Aoga, John O. R.
Fanou, Codjo C.
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 161 - 169
[7] FACTORIZED CONTEXT MODELLING FOR TEXT-TO-SPEECH SYNTHESIS
Lu, Heng
King, Simon
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7849 - 7853
[8] Paraphrase generation to improve Text-To-Speech Synthesis
Putois, Ghislain
Chevelu, Jonathan
Boidin, Cedric
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 198 - 201
[9] A Small Footprint Hybrid Statistical and Unit Selection Text-to-Speech Synthesis System for Turkish
Guner, Ekrem
Demiroglu, Cenk
COMPUTER AND INFORMATION SCIENCES II, 2012, : 85 - 91
[10] A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units
Tiomkin, Stas
Malah, David
Shechtman, Slava
Kons, Zvi
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1278 - 1288

← 1 2 3 4 5 →