Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

被引:9
作者
Lecouteux, Benjamin [1 ]
Linares, Georges [1 ]
Oger, Stanislas [1 ]
机构
[1] Univ Avignon, LIA, F-84911 Avignon 9, France
关键词
Speech processing; Acoustic model training; Text-to-speech alignment; LANGUAGE; MODELS;
D O I
10.1016/j.csl.2011.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The training of state-of-the-art automatic speech recognition (ASR) systems requires huge relevant training corpora. The cost of such databases is high and remains a major limitation for the development of speech-enabled applications in particular contexts (e.g. low-density languages or specialized domains). On the other hand, a large amount of data can be found in news prompts, movie subtitles or scripts, etc. The use of such data as training corpus could provide a low-cost solution to the acoustic model estimation problem. Unfortunately, prior transcripts are seldom exact with respect to the content of the speech signal, and suffer from a lack of temporal information. This paper tackles the issue of prompt-based speech corpora improvement, by addressing the problems mentioned above. We propose a method allowing to locate accurate transcript segments in speech signals and automatically correct errors or lack of transcript surrounding these segments. This method relies on a new decoding strategy where the search algorithm is driven by the imperfect transcription of the input utterances. The experiments are conducted on the French language, by using the ESTER database and a set of records (and associated prompts) from RTBF (Radio Television Beige Francophone). The results demonstrate the effectiveness of the proposed approach, in terms of both error correction and text-to-speech alignment. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:67 / 89
页数:23
相关论文
共 53 条
[1]  
[Anonymous], P 6 INT 9 EUR C SPEE
[2]  
[Anonymous], P INTERSPEECH
[3]  
Berndt DJ., 1994, USING DYNAMIC TIME W, DOI DOI 10.5555/3000850.3000887
[4]  
Bonastre JF, 2005, INT CONF ACOUST SPEE, P737
[5]   AUTOMATIC SPEECH RECOGNITION IN MACHINE-AIDED TRANSLATION [J].
BROWN, PF ;
CHEN, SF ;
DELLAPIETRA, SA ;
DELLAPIETRA, VJ ;
KEHLER, AS ;
MERCER, RL .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (03) :177-187
[6]  
Cardinal Patrick, 2005, INTERSPEECH, P3345
[7]  
Chan HY, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P737
[8]  
Chen L., 2004, P INT C SPOK LANG PR, P1281
[9]  
Chen LZ, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P189
[10]  
Chollet G., 1995, SPEECH RECOGNITION C, P32