Phonetic alignment:: speech synthesis-based vs. Viterbi-based

被引：34

作者：

Malfrère, F

Deroo, O

Dutoit, T

Ris, C

机构：

[1] Fac Polytech Mons, TCTS, B-7000 Mons, Belgium

[2] Babel Technol SA, B-7000 Mons, Belgium

来源：

SPEECH COMMUNICATION | 2003年 / 40卷 / 04期

关键词：

speech segmentation; hidden Markov models; hybrid HMM/ANN systems; speech synthesis; large speech corpora;

D O I：

10.1016/S0167-6393(02)00131-0

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we compare two different methods for automatically phonetically labeling a continuous speech database, as usually required for designing a speech recognition or speech synthesis system. The first method is based on temporal alignment of speech on a synthetic speech pattern; the second method uses either a continuous density hidden Markov models (HMM) or a hybrid HMM/ANN (artificial neural network) system in forced alignment mode. Both systems have been evaluated on read utterances not part of the training set of the HMM systems, and compared to manual segmentation. This study outlines the advantages and drawbacks of both methods. The speech synthetic system has the great advantage that no training stage (hence no large labeled database) is needed, while HMM systems easily handle multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic creation of large phonetically labeled speech databases, based on using the synthetic speech segmentation tool to bootstrap the training process of either a HMM or a hybrid HMM/ANN system. The importance of such segmentation tools is a key point for the development of improved multilingual speech synthesis and recognition systems. (C) 2002 Elsevier Science B.V. All rights reserved.

引用

页码：503 / 515

页数：13

共 36 条

[1]

BAHL LR, 1995, P ICASSP, P41

[2] DRAGON SYSTEM - OVERVIEW [J].

BAKER, JK .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :24-29

[3]

Baum L.E., 1972, Inequalities III: Proceedings of the Third Symposium on Inequalities, page, V3, P1

[4]

Bourlard H. A., 1994, Connectionist speech recognition: a hybrid approach

[5] AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS [J].

BRUGNARA, F ;

FALAVIGNA, D ;

OMOLOGO, M .

SPEECH COMMUNICATION, 1993, 12 (04) :357-370

[6]

CARRE R, 1984, P INT C AC SPEECH SI

[7]

Cosi Piero., 1991, P EUR C SPEECH COMM, P693

[8]

DEROO O, 1998, P EUR C SIGN PROC EU, P1161

[9]

DEVILLE G, 1999, P EUR C SPEECH COMM, P843

[10]

DUPONT S, 1997, P EUR C SPEECH COMM, P1947

← 1 2 3 4 →