Enhancing ASR Systems for Under-Resourced Languages through a Novel Unsupervised Acoustic Model Training Technique

被引：5

作者：

Cucu, Horia ^{[1
]}

Buzo, Andi ^{[1
]}

Besacier, Laurent ^{[2
]}

Burileanu, Corneliu ^{[1
]}

机构：

[1] Univ Politehn Bucuresti, Speech & Dialogue Res Lab, Bucharest, Romania

[2] Univ Grenoble 1, Lab Informat Grenoble, Grenoble, France

来源：

ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING | 2015年 / 15卷 / 01期

关键词：

speech recognition; under-resourced languages; unsupervised acoustic modeling; unsupervised training;

D O I：

10.4316/AECE.2015.01009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Statistical speech and language processing techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition. For high-resourced, international languages this data is widely available, while for under-resourced languages the lack of data poses serious problems. Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic model for any under-resourced language. This study describes a novel unsupervised acoustic model training method and evaluates it on speech data in an under-resourced language: Romanian. The key novel factor of the method is the usage of two complementary seed ASR systems to produce high quality transcriptions, with a Character Error Rate (ChER) < 5%, for initially untranscribed speech data. The methodology leads to a relative Word Error Rate (WER) improvement of more than 10% when 100 hours of untranscribed speech are used.

引用

页码：63 / 68

页数：6

共 18 条

[1] [Anonymous], 2013, P INTERSPEECH
[2] [Anonymous], 2009, P INTERSPEECH
[3] [Anonymous], 2014, P SLTU
[4] Besacier L., SPEECH COMMUNICATION, V56, P85
[5] Buzo A., 2013, P INT C SPEECH TECHN, P77
[6] Cucu H, 2011, THESIS U POLITEHNICA
[7] Cucu H., 2014, P 10 INT C COMMUNICA, P111
[8] Fraga-Silva T, 2011, INT CONF ACOUST SPEE, P4656
[9] Kemp T., 1999, PROC EUR C SPEECH CO, P2725
[10] Lightly supervised and unsupervised acoustic model training
Lamel, L
Gauvain, JL
Adda, G
[J]. COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01) : 115 - 129

← 1 2 →