Evaluating Open-source Toolkits for Automatic Speech Recognition of South African Languages

被引：0

作者：

Naidoo, Ashentha ^{[1
]}

Tsoeu, Mohohlo ^{[1
]}

机构：

[1] Univ Cape Town, Dept Elect Engn, Cape Town, South Africa

来源：

2019 SOUTHERN AFRICAN UNIVERSITIES POWER ENGINEERING CONFERENCE/ROBOTICS AND MECHATRONICS/PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA (SAUPEC/ROBMECH/PRASA) | 2019年

基金：

新加坡国家研究基金会;

关键词：

automatic speech recognition; under-resourced; evaluation; languages; isiXhosa; English;

D O I：

10.1109/robomech.2019.8704774

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Automatic speech recognition is a critical component of human language technologies. It concerns the translation of speech into textual data which can be processed by computers. Thus, it offers the creation of an intimate link allowing humans to interact with machines on a completely natural level. A variety of open-source toolkits exist for the development of these systems. These toolkits have been successfully implemented and tested for use on well-resourced languages. However, the same level of testing has not been performed for South African languages. This investigation sets out to evaluate popular open-source tools for South African languages and identify optimal toolkit configurations for each language and toolkit The NCHLT corpora were used to set up automatic speech recognition systems for English and isiXhosa using Kaldi, CMU Sphinx, and HTK. The word error rates achieved during this investigation showed that the best configurations from this investigation achieved better performance than those which were reported by the developers of the NCHLT corpus.

引用

页码：160 / 165

页数：6

共 13 条

[1]

Acero Alejandro., 2012, Acoustical and environmental robustness in automatic speech recognition, V201

[2]

Barnard E., 2014, P SLTU, P194

[3]

Brown P. F., 1992, Computational Linguistics, V18, P467

[4]

Gaida C., 2014, COMP OPEN SOURCE SPE

[5] The Application of Hidden Markov Models in Speech Recognition [J].

Gales, Mark ;

Young, Steve .

FOUNDATIONS AND TRENDS IN SIGNAL PROCESSING, 2007, 1 (03) :195-304

[6]

Henselmans D., 2013, P 24 ANN S PATT REC

[7]

Hori T., 2003, 8 EUR C SPEECH COMM

[8] AN OVERVIEW OF THE SPHINX SPEECH RECOGNITION SYSTEM [J].

LEE, KF ;

HON, HW ;

REDDY, R .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (01) :35-45

[9]

Niesler T., 2004, South African Computer Journal, P3

[10]

Povey D., 2011, IEEE 2011 WORKSHOP A, P1

← 1 2 →