Advanced Rich Transcription System for Estonian Speech

被引:22
作者
Alumae, Tanel [1 ]
Tilk, Ottokar [1 ]
Asadullah [1 ]
机构
[1] Tallinn Univ Technol, Lab Language Technol, Tallinn, Estonia
来源
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018 | 2018年 / 307卷
关键词
Speech recognition; Estonian; punctuation recovery; speaker identification;
D O I
10.3233/978-1-61499-912-6-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the current TTU speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 22 条
[1]  
Alumae T., 2007, NODALIDA
[2]  
Alumae T., 2012, BALTIC HLT
[3]  
Alumae T., 2014, SPOKEN LANGUAGE TECH
[4]  
[Anonymous], INTERSPEECH 2015
[5]  
Asadullah, 2018, 21 INT C TEXT SPEECH
[6]  
Cho K., 2014, ARXIV, DOI 10.3115/v1/w14-4012
[7]  
Eek A., 1999, P LP 98, V98, P529
[8]  
Gorman Kyle., 2016, Proceedings of the SIGFSM workshop on statistical NLP and weighted automata, P75, DOI [DOI 10.18653/V1/W16-2409, 10.18653/v1/W16-2409]
[9]  
Jacob Eisenstein, 2017, P 2017 C EMPIRICAL M
[10]  
Kaalep H. -J., 2005, BALTIC HLT