Measuring the gap between HMM-based ASR and TTS

被引:0
作者
Dines, John [1 ]
Yamagishi, Junichi [2 ]
King, Simon [2 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Univ Edinburgh, CSTR, Edinburgh EH8 9AB, Midlothian, Scotland
来源
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年
基金
英国工程与自然科学研究理事会;
关键词
speech synthesis; speech recognition; unified models;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches.
引用
收藏
页码:1411 / +
页数:2
相关论文
共 50 条
  • [31] Parameterization of Vocal Fry in HMM-Based Speech Synthesis
    Silen, Hanna
    Helander, Elina
    Nurminen, Jani
    Gabbouj, Moncef
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1735 - +
  • [32] Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation
    Chen, Zhehuai
    Rosenberg, Andrew
    Zhang, Yu
    Zen, Heiga
    Ghodsi, Mohammadreza
    Huang, Yinghui
    Emond, Jesse
    Wang, Gary
    Ramabhadran, Bhuvana
    Moreno, Pedro J.
    INTERSPEECH 2021, 2021, : 736 - 740
  • [33] HMM-based Tibetan Lhasa Speech Synthesis System
    Wu Zhiqiang
    Yu Hongzhi
    Li Guanyu
    Wan Shuhui
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 92 - 95
  • [34] Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis
    Picart, Benjamin
    Drugman, Thomas
    Dutoit, Thierly
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1808 - 1811
  • [35] QUALITY CONTROL OF AUTOMATIC LABELLING USING HMM-BASED SYNTHESIS
    Pammi, Sathish
    Charfuelan, Marcela
    Schroeder, Marc
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4277 - +
  • [36] FPGA Architecture of HMM-based Decoder Module in Speech Recognizer
    Trang Hoang
    Viet Vo Quoc
    Truong Nguyen Ly Thien
    2012 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2012, : 354 - 358
  • [37] x Formant-controlled HMM-based Speech Synthesis
    Lei, Ming
    Yamagishi, Junichi
    Richmond, Korin
    Ling, Zhen-Hua
    King, Simon
    Dai, Li-Rong
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2788 - +
  • [38] Automatic Variation of the Degree of Articulation in New HMM-Based Voices
    Picart, Benjamin
    Drugman, Thomas
    Dutoit, Thierry
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 307 - 322
  • [39] A Covariance-Tying Technique for HMM-Based Speech Synthesis
    Oura, Keiichiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (03): : 595 - 601
  • [40] Data Selection and Adaptation for Naturalness in HMM-based Speech Synthesis
    Cooper, Erica
    Chang, Alison
    Levitan, Yocheved
    Hirschberg, Julia
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 357 - +