Measuring the gap between HMM-based ASR and TTS

被引：0

作者：

Dines, John ^{[1
]}

Yamagishi, Junichi ^{[2
]}

King, Simon ^{[2
]}

机构：

[1] Idiap Res Inst, CH-1920 Martigny, Switzerland

[2] Univ Edinburgh, CSTR, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

基金：

英国工程与自然科学研究理事会;

关键词：

speech synthesis; speech recognition; unified models;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches.

引用

页码：1411 / +

页数：2

共 50 条

[41] HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering
Raitio, Tuomo
Suni, Antti
Yamagishi, Junichi
Pulakka, Hannu
Nurminen, Jani
Vainio, Martti
Alku, Paavo
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 153 - 165
[42] Implementation and Evaluation of an HMM-based Thai Speech Synthesis System
Chomphan, Suphattharachai
Kobayashi, Takao
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 173 - 176
[43] Implementation and evaluation of an HMM-based Korean speech synthesis system
Kim, SJ
Kim, JJ
Hahn, M
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1116 - 1119
[44] HMM-BASED SEQUENCE-TO-FRAME MAPPING FOR VOICE CONVERSION
Qiao, Yu
Saito, Daisuke
Minematsu, Nobuaki
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4830 - 4833
[45] Voiced/Unvoiced Decision Algorithm for HMM-based Speech Synthesis
Kang, Shiyin
Shuang, Zhiwei
Duan, Quansheng
Qin, Yong
Cai, Lianhong
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 420 - +
[46] A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis
Pobar, Miran
Justin, Tadej
Zibert, Janez
Mihelic, France
Ipsic, Ivo
TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 44 - 51
[47] Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis
Ling, Zhen-Hua
Richmond, Korin
Yamagishi, Junichi
Wang, Ren-Hua
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1171 - 1185
[48] Effect of MPEG Audio Compression on HMM-based Speech Synthesis
Bollepalli, Bajibabu
Raitio, Tuomo
Alku, Paavo
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1061 - 1065
[49] Incorporating the voicing information into HMM-based automatic speech recognition
Jancovic, Peter
Koekueer, Muenevver
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 42 - 46
[50] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
Shiga, Yoshinori
Toda, Tomoki
Sakai, Shinsuke
Kawai, Hisashi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812

← 1 2 3 4 5 →