Large vocabulary speech recognition in French

被引:1
作者
Adda-Decker, M [1 ]
Adda, G [1 ]
Gauvain, JL [1 ]
Lamel, L [1 ]
机构
[1] LIMSI, CNRS, Spoken Language Proc Grp, F-91403 Orsay, France
来源
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年
关键词
D O I
10.1109/ICASSP.1999.758058
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this contribution we present some design considerations concerning our large vocabulary continuous speech recognition system in French.(1) The impact of the epoch of the text training material on lexical coverage, language model perplexity and recognition performance on newspaper texts is demonstrated. The effectiveness of larger vocabulary sizes and larger text training corpora for language modeling is investigated. French is a highly inflected language producing large lexical variety and a high homophone rate. About 30% of recognition errors are shown to be due to substitutions between inflected forms of a given root form. When word error rates are analysed as a function of word frequency, a significant increase in the error rate can be measured for frequency ranks above 5000.
引用
收藏
页码:45 / 48
页数:4
相关论文
共 13 条
[1]  
ADDA G, 1997, 1 JST FRANCIL APR
[2]  
ADDA G, 1997, EUROSPEECH 97
[3]  
ADDADECKER M, 1998, 22 JEP
[4]  
ADDADECKER M, LREC 98
[5]  
DEMAREUIL PB, SEM GDR PRC CHM LEX
[6]  
DOLMAZON JM, 1997, 1 JST FRANCIL AV APR
[7]   SPEAKER-INDEPENDENT CONTINUOUS SPEECH DICTATION [J].
GAUVAIN, JL ;
LAMEL, LF ;
ADDA, G ;
ADDADECKER, M .
SPEECH COMMUNICATION, 1994, 15 (1-2) :21-37
[8]  
GAUVAIN JL, 1996, P DARPA SPEECH REC W
[9]  
LAMEL L, 1995, EUROSPEECH 95
[10]  
LAMEL LF, 1991, EUROSPEECH 91