Advanced Rich Transcription System for Estonian Speech

被引：22

作者：

Alumae, Tanel ^{[1
]}

Tilk, Ottokar ^{[1
]}

Asadullah ^{[1
]}

机构：

[1] Tallinn Univ Technol, Lab Language Technol, Tallinn, Estonia

来源：

HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018 | 2018年 / 307卷

关键词：

Speech recognition; Estonian; punctuation recovery; speaker identification;

D O I：

10.3233/978-1-61499-912-6-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the current TTU speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.

引用

页码：1 / 8

页数：8

共 22 条

[11]

Kaalep H. -J., 2001, C 9 INT FENN UGR PAR

[12]

Karu M., 2018, SPEAKER ODYSSEY

[13]

Ko T, 2017, INT CONF ACOUST SPEE, P5220, DOI 10.1109/ICASSP.2017.7953152

[14]

Lippus P., 2011, THESIS

[15]

Meignier S, 2010, CMU SPUD WORKSH

[16]

Meister E, 2012, 27 FONETIIKAN PAIVAT

[17]

Povey D., 2011, IEEE 2011 WORKSH AUT

[18] Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks [J].

Povey, Daniel ;

Cheng, Gaofeng ;

Wang, Yiming ;

Li, Ke ;

Xu, Hainan ;

Yarmohamadi, Mahsa ;

Khudanpur, Sanjeev .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3743-3747

[19] Purely sequence-trained neural networks for ASR based on lattice-free MMI [J].

Povey, Daniel ;

Peddinti, Vijayaditya ;

Galvez, Daniel ;

Ghahremani, Pegah ;

Manohar, Vimal ;

Na, Xingyu ;

Wang, Yiming ;

Khudanpur, Sanjeev .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2751-2755

[20]

Synder D., 2015, MUSAN MUSIC SPEECH N

← 1 2 3 →