A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引：0

作者：

Adiga, Nagaraj ^{[1
]}

Prasanna, S. R. Mahadeva ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India

来源：

2014 ANNUAL IEEE INDIA CONFERENCE (INDICON) | 2014年

关键词：

speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.

引用

页数：5

共 50 条

[41] A Smart Control System for the Oil Industry Using Text-to-Speech Synthesis Based on IIoT [J].

Mandeel, Ali Raheem ;

Aggar, Ammar Abdullah ;

Al-Radhi, Mohammed Salah ;

Csapo, Tamas Gabor .

ELECTRONICS, 2023, 12 (16)

[42] Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model [J].

Saeki, Takaaki ;

Takamichi, Shinnosuke ;

Saruwatari, Hiroshi .

IEEE SIGNAL PROCESSING LETTERS, 2021, 28 :857-861

[43] Continuity Metric for Unit Selection based Text-to-Speech Synthesis [J].

Lakkavalli, Vikram Ramesh ;

Arulmozhi, P. ;

Ramakrishnan, A. G. .

2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,

[44] Text-to-speech synthesis system with Arabic diacritic recognition system [J].

Rebai, Ilyes ;

BenAyed, Yassine .

COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01) :43-60

[45] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System [J].

Tsiakoulis, Pirros ;

Karabetsos, Sotiris ;

Chalamandaris, Aimilios ;

Raptis, Spyros .

ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 :370-383

[46] Cross-Language Phonemisation In German Text-To-Speech Synthesis [J].

Steigner, Jochen ;

Schroeder, Marc .

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, :833-+

[47] Effect of formant and F0 discontinuity on perceived vowel duration: Impacts for concatenative speech synthesis [J].

Boril, Tomas ;

Sturm, Pavel ;

Skarnitzl, Radek ;

Volin, Jan .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2998-3002

[48] ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS CONDITIONED USING SELF-SUPERVISED SPEECH REPRESENTATION MODEL [J].

Fujita, Kenichi ;

Ashihara, Takanori ;

Kanagawa, Hiroki ;

Moriya, Takafumi ;

Ijima, Yusuke .

2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,

[49] ERROR DETECTION OF GRAPHEME-TO-PHONEME CONVERSION IN TEXT-TO-SPEECH SYNTHESIS USING SPEECH SIGNAL AND LEXICAL CONTEXT [J].

Vythelingum, Kevin ;

Esteve, Yannick ;

Rosec, Olivier .

2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, :692-697

[50] Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks [J].

Valentini-Botinhao, Cassia ;

Wang, Xin ;

Takaki, Shinji ;

Yamagishi, Junichi .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :352-356

← 1 2 3 4 5 →