A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引:0
作者
Adiga, Nagaraj [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India
来源
2014 ANNUAL IEEE INDIA CONFERENCE (INDICON) | 2014年
关键词
speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Vowel generation for children with cerebral palsy using myocontrol of a speech synthesizer
    Niu, Chuanxin M.
    Lee, Kangwoo
    Houde, John F.
    Sanger, Terence D.
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2015, 8
  • [32] SMALL FOOTPRINT HYBRID STATISTICAL/UNIT SELECTION TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR AGGLUTINATIVE LANGUAGES
    Guner, Ekrem
    Demiroglu, Cenk
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4537 - 4540
  • [33] Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
    Yamagishi, Junichi
    Nose, Takashi
    Zen, Heiga
    Ling, Zhen-Hua
    Toda, Tomoki
    Tokuda, Keiichi
    King, Simon
    Renals, Steve
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1208 - 1230
  • [34] A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers
    Chalamandaris, Aimilios
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Raptis, Spyros
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (03) : 1890 - 1897
  • [35] Indonesian Text-To-Speech System Using Syllable Concatenation: Speech Optimization
    Mengko, Richard
    Ayuningtyas, Aulia
    [J]. PROCEEDINGS OF 2013 3RD INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATIONS, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING (ICICI-BME), 2013, : 412 - 415
  • [36] Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 990 - 993
  • [37] Vowel Onset Point based Waveform Concatenation Technique for Intelligible Speech Synthesis
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 622 - 626
  • [38] Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis
    Nurk, Tonis
    [J]. HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 162 - 168
  • [39] Meta Learning Text-to-Speech Synthesis in over 7000 Languages
    Lux, Florian
    Meyer, Sarina
    Behringer, Lyonel
    Zalkow, Frank
    Do, Phat
    Coler, Matt
    Habets, Emanuel A. P.
    Ngoc Thang Vu
    [J]. INTERSPEECH 2024, 2024, : 4958 - 4962
  • [40] Embedded Unit Selection Text-to-Speech Synthesis for Mobile Devices
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (02) : 613 - 621