Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives

被引:0
作者
Cerva, Petr [1 ]
Palecek, Karel [1 ]
Silovsky, Jan [1 ]
Nouza, Jan [1 ]
机构
[1] Tech Univ Liberec, Fac Mechatron, Inst Informat Technol & Elect, CZ-46117 Liberec, Czech Republic
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
unsupervised speaker adaptation; VTLN; CMLLR; SAT; spoken data transcription; RETRIEVAL; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with unsupervised feature-based speaker adaptation techniques. The goal is to design an optimal adaptation approach for improving the recognition accuracy of a LVCSR system developed for automatic transcription of large archives of spoken Czech (e.g. the archive of the parliament talks, historical archives of Czech broadcast stations, etc.) For this purpose, several modifications of VTLN and CMLLR techniques were investigated and combined together. Our study focuses on the application of the adaptation methods in the recognition process as well as in building a normalized acoustic model within the speaker adaptive training scheme. The methods were evaluated experimentally on a large amount of various data (with total number 93k words). The resulting two-step adaptation scheme yields a significant WER reduction from 17.8 % to 14.8 %.
引用
收藏
页码:2576 / 2579
页数:4
相关论文
共 9 条
[1]  
Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
[2]   Automatic recognition of spontaneous speech for access to multilingual oral history archives [J].
Byrne, W ;
Doermann, D ;
Franz, MT ;
Gustman, S ;
Hajic, J ;
Oard, D ;
Picheny, M ;
Psutka, J ;
Ramabhadran, B ;
Soergel, D ;
Ward, T ;
Zhu, WJ .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04) :420-435
[3]   Maximum likelihood linear transformations for HMM-based speech recognition [J].
Gales, MJF .
COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) :75-98
[4]   SpeechFind: Advances in spoken document retrieval for a National Gallery of the Spoken Word [J].
Hansen, JH ;
Huang, RQ ;
Zhou, B ;
Seadle, M ;
Deller, JR ;
Gurijala, AR ;
Kurimo, M ;
Angkititrakul, P .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :712-730
[5]  
Huang C., 2002, INTERSPEECH
[6]  
Molau S., 2000, P ESSV00, P209
[7]  
Nouza J, 2006, LECT NOTES ARTIF INT, V4188, P485
[8]  
Vandecatseye A., 2004, P LREC 2004, P873
[9]   Speaker adaptive modeling by vocal tract normalization [J].
Welling, L ;
Ney, H ;
Kanthak, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (06) :415-426