AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS

被引:97
作者
BRUGNARA, F
FALAVIGNA, D
OMOLOGO, M
机构
[1] Istituto per la Ricerca Scientifica e Tecnologica
关键词
SPEECH SEGMENTATION AND LABELING; HMM (HIDDEN MARKOV MODELS); SPEECH DATABASES; ACOUSTIC PHONETIC UNITS;
D O I
10.1016/0167-6393(93)90083-W
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
An accurate database documentation at phonetic level is very important for speech research: however, manual segmentation and labeling is a time consuming and error prone task. This article describes an automatic procedure for the segmentation of speech: given either the linguistic or the phonetic content of a speech utterance, the system provides phone boundaries. The technique is based on the use of an acoustic-phonetic unit Hidden Markov Model (HMM) recognizer: both the recognizer and the segmentation system have been designed exploiting the DARPA-TIMIT acoustic-phonetic continuous speech database of American English. Segmentation and labeling experiments have been conducted in different conditions to check the reliability of the resulting system. Satisfactory results have been obtained, especially when the system is trained with some manually presegmented material. The size of this material is a crucial factor; system performance has been evaluated with respect to this parameter. It turns out that the system provides 88.3% correct boundary location, given a tolerance of 20 ms, when only 256 phonetically balanced sentences are used for its training.
引用
收藏
页码:357 / 370
页数:14
相关论文
共 10 条
[1]  
BRUGNARA F, 1992, OCT P INT C SPOK LAN, P627
[2]  
COSI P, 1991, SEP P EUR C SPEECH C, P693
[3]  
FALAVIGNA D, 1990, P EUROPEAN SIGNAL PR, P1139
[4]  
LAMEL L, 1986, FEB P DARPA SPEECH R, P100
[5]   SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS [J].
LEE, KF ;
HON, HW .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11) :1641-1648
[6]  
LJOLJE A, 1991, INT CONF ACOUST SPEE, P473, DOI 10.1109/ICASSP.1991.150379
[7]  
MARZAL A, 1990, SEP P EUR SIGN PROC, P43
[8]  
STRINGA L, 1990, IRST901211 TECHN REP
[9]  
SVENDSEN T, 1987, IEEE T ACOUST SPEECH, P77
[10]  
[No title captured]