Advanced Rich Transcription System for Estonian Speech

被引：22

作者：

Alumae, Tanel ^{[1
]}

Tilk, Ottokar ^{[1
]}

Asadullah ^{[1
]}

机构：

[1] Tallinn Univ Technol, Lab Language Technol, Tallinn, Estonia

来源：

HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018 | 2018年 / 307卷

关键词：

Speech recognition; Estonian; punctuation recovery; speaker identification;

D O I：

10.3233/978-1-61499-912-6-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the current TTU speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.

引用

页码：1 / 8

页数：8

共 22 条

[1]

Alumae T., 2007, NODALIDA

[2]

Alumae T., 2012, BALTIC HLT

[3]

Alumae T., 2014, SPOKEN LANGUAGE TECH

[4]

[Anonymous], INTERSPEECH 2015

[5]

Asadullah, 2018, 21 INT C TEXT SPEECH

[6]

Cho K., 2014, ARXIV, DOI 10.3115/v1/w14-4012

[7]

Eek A., 1999, P LP 98, V98, P529

[8]

Gorman Kyle., 2016, Proceedings of the SIGFSM workshop on statistical NLP and weighted automata, P75, DOI [DOI 10.18653/V1/W16-2409, 10.18653/v1/W16-2409]

[9]

Jacob Eisenstein, 2017, P 2017 C EMPIRICAL M

[10]

Kaalep H. -J., 2005, BALTIC HLT

← 1 2 3 →