Using prosody in fixed stress languages for improvement of speech recognition

被引:0
作者
Szaszak, Gyoergy [1 ]
Vicsi, Klara [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
来源
VERBAL AND NONVERBAL COMMUNICATION BEHAVIOURS | 2007年 / 4775卷
关键词
speech recognition; prosody; agglutinating languages; fixed stress; lattice rescoring;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this chapter we examine the usage of prosodic features in speech recognition, with a special attention payed to agglutinating and fixed stress languages. Current knowledge in speech prosody exploitation is addressed in the introduction. The used prosodic features, acoustic-prosodic pre-processing, and segmentation in terms of prosodic units are presented in details. We use the expression "prosodic unit" in order to differentiate them from prosodic phrases, which are usually longer. We trained a HMM-based prosodic segmenter relying on fundamental frequency and intensity of speech. The output of this prosodic segmenter is used for N-best lattice rescoring in parallel with a simplified bigram language model in a continuous speech recognizer, in order to improve speech recognition performance. Experiments for Hungarian language show a WER reduction of about 4% using a simple lattice rescoring. The performance of the prosodic segmenter is also investigated in comparison with our earlier experiments.
引用
收藏
页码:138 / +
页数:2
相关论文
共 13 条
[1]   Integrated recognition of words and prosodic phrase boundaries [J].
Gallwitz, F ;
Niemann, H ;
Nöth, E ;
Warnke, V .
SPEECH COMMUNICATION, 2002, 36 (1-2) :81-95
[2]  
HUNYADI L, 2002, METALINGUISTICA, V13
[3]  
KOMPE R, 1995, P 4 EUR C SPEECH COM, P1333
[4]   Prosody prediction from text in Hungarian and its realization in TTS conversion [J].
Koutny I. ;
Olaszy G. ;
Olaszi P. .
International Journal of Speech Technology, 2000, 3 (3-4) :187-200
[5]  
LANGLAIS P, 1993, 3 EUR C SPEECH COMM, P2007
[6]   BABEL: An Eastern European multi-language database [J].
Roach, P ;
Arnfield, S ;
Barry, W ;
Baltova, J ;
Boldea, M ;
Fourcin, A ;
Gonet, W ;
Gubrynowicz, R ;
Hallum, E ;
Lamel, L ;
Marasek, K ;
Marchal, A ;
Meister, E ;
Vicsi, K .
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, :1892-1893
[7]  
Silverman K.E. A., 1992, P 1992 INT C SPOKEN, V2, P867, DOI [10.21437/ICSLP.1992-260, DOI 10.21437/ICSLP.1992-260]
[8]  
SJOLANDER K, 2000, P ICSLP 2000 BEIJ CH, V4, P464
[9]  
Veilleux N. M., 1993, P ARPA WORKSH HUM LA, P335
[10]  
Venditti J.J., 2003, P 15 INT C PHON SCI, P107