A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引：0

作者：

Adiga, Nagaraj ^{[1
]}

Prasanna, S. R. Mahadeva ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India

来源：

2014 ANNUAL IEEE INDIA CONFERENCE (INDICON) | 2014年

关键词：

speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.

引用

页数：5

共 50 条

[11] Environment Aware Text-to-Speech Synthesis
Tan, Daxin
Zhang, Guangyan
Lee, Tan
INTERSPEECH 2022, 2022, : 481 - 485
[12] Emotional Intelligence in Text-To-Speech Synthesis in Pali Language Using Fuzzy Logic
Mache, Suhas
Dabhade, Siddharth
JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 179 - 192
[13] Symbol based concatenation approach for Text to Speech System for Hindi using vowel classification technique
Chaudhury, Pamela
Rao, Madhuri
Kumar, KVinod
2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 1081 - +
[14] Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
Yeshpanov, Rustem
Mussakhojayeva, Saida
Khassanov, Yerbolat
INTERSPEECH 2023, 2023, : 5521 - 5525
[15] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
Sreelekshmi, K. S.
Gopinath, Deepa P.
2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127
[16] Using Audio Books for Training a Text-to-Speech System
Chalamandaris, Aimilios
Tsiakoulis, Pirros
Karabetsos, Sotiris
Raptis, Spryos
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3076 - 3080
[17] Text-to-speech synthesis with an Indian language perspective
Panda, Soumya Priyadarsini
Nayak, Ajit Kumar
Patnaik, Srikanta
INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2015, 6 (3-4) : 170 - 178
[18] A waveform concatenation technique for text-to-speech synthesis
Panda S.P.
Nayak A.K.
International Journal of Speech Technology, 2017, 20 (4) : 959 - 976
[19] CHARACTERIZATION OF RHYTHMIC PATTERNS FOR TEXT-TO-SPEECH SYNTHESIS
BARBOSA, P
BAILLY, G
SPEECH COMMUNICATION, 1994, 15 (1-2) : 127 - 137
[20] IMPROVED POS TAGGING FOR TEXT-TO-SPEECH SYNTHESIS
Sun, Ming
Bellegarda, Jerome R.
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5384 - 5387

← 1 2 3 4 5 →