Advanced Rich Transcription System for Estonian Speech

被引:22
作者
Alumae, Tanel [1 ]
Tilk, Ottokar [1 ]
Asadullah [1 ]
机构
[1] Tallinn Univ Technol, Lab Language Technol, Tallinn, Estonia
来源
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018 | 2018年 / 307卷
关键词
Speech recognition; Estonian; punctuation recovery; speaker identification;
D O I
10.3233/978-1-61499-912-6-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the current TTU speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 22 条
[11]  
Kaalep H. -J., 2001, C 9 INT FENN UGR PAR
[12]  
Karu M., 2018, SPEAKER ODYSSEY
[13]  
Ko T, 2017, INT CONF ACOUST SPEE, P5220, DOI 10.1109/ICASSP.2017.7953152
[14]  
Lippus P., 2011, THESIS
[15]  
Meignier S, 2010, CMU SPUD WORKSH
[16]  
Meister E, 2012, 27 FONETIIKAN PAIVAT
[17]  
Povey D., 2011, IEEE 2011 WORKSH AUT
[18]   Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks [J].
Povey, Daniel ;
Cheng, Gaofeng ;
Wang, Yiming ;
Li, Ke ;
Xu, Hainan ;
Yarmohamadi, Mahsa ;
Khudanpur, Sanjeev .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3743-3747
[19]   Purely sequence-trained neural networks for ASR based on lattice-free MMI [J].
Povey, Daniel ;
Peddinti, Vijayaditya ;
Galvez, Daniel ;
Ghahremani, Pegah ;
Manohar, Vimal ;
Na, Xingyu ;
Wang, Yiming ;
Khudanpur, Sanjeev .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2751-2755
[20]  
Synder D., 2015, MUSAN MUSIC SPEECH N