Measuring the gap between HMM-based ASR and TTS

被引:0
作者
Dines, John [1 ]
Yamagishi, Junichi [2 ]
King, Simon [2 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Univ Edinburgh, CSTR, Edinburgh EH8 9AB, Midlothian, Scotland
来源
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年
基金
英国工程与自然科学研究理事会;
关键词
speech synthesis; speech recognition; unified models;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches.
引用
收藏
页码:1411 / +
页数:2
相关论文
共 50 条
  • [41] HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering
    Raitio, Tuomo
    Suni, Antti
    Yamagishi, Junichi
    Pulakka, Hannu
    Nurminen, Jani
    Vainio, Martti
    Alku, Paavo
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 153 - 165
  • [42] Implementation and Evaluation of an HMM-based Thai Speech Synthesis System
    Chomphan, Suphattharachai
    Kobayashi, Takao
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 173 - 176
  • [43] Implementation and evaluation of an HMM-based Korean speech synthesis system
    Kim, SJ
    Kim, JJ
    Hahn, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1116 - 1119
  • [44] HMM-BASED SEQUENCE-TO-FRAME MAPPING FOR VOICE CONVERSION
    Qiao, Yu
    Saito, Daisuke
    Minematsu, Nobuaki
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4830 - 4833
  • [45] Voiced/Unvoiced Decision Algorithm for HMM-based Speech Synthesis
    Kang, Shiyin
    Shuang, Zhiwei
    Duan, Quansheng
    Qin, Yong
    Cai, Lianhong
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 420 - +
  • [46] A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis
    Pobar, Miran
    Justin, Tadej
    Zibert, Janez
    Mihelic, France
    Ipsic, Ivo
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 44 - 51
  • [47] Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis
    Ling, Zhen-Hua
    Richmond, Korin
    Yamagishi, Junichi
    Wang, Ren-Hua
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1171 - 1185
  • [48] Effect of MPEG Audio Compression on HMM-based Speech Synthesis
    Bollepalli, Bajibabu
    Raitio, Tuomo
    Alku, Paavo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1061 - 1065
  • [49] Incorporating the voicing information into HMM-based automatic speech recognition
    Jancovic, Peter
    Koekueer, Muenevver
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 42 - 46
  • [50] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
    Shiga, Yoshinori
    Toda, Tomoki
    Sakai, Shinsuke
    Kawai, Hisashi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812