Enhancing ASR Systems for Under-Resourced Languages through a Novel Unsupervised Acoustic Model Training Technique

被引:5
作者
Cucu, Horia [1 ]
Buzo, Andi [1 ]
Besacier, Laurent [2 ]
Burileanu, Corneliu [1 ]
机构
[1] Univ Politehn Bucuresti, Speech & Dialogue Res Lab, Bucharest, Romania
[2] Univ Grenoble 1, Lab Informat Grenoble, Grenoble, France
关键词
speech recognition; under-resourced languages; unsupervised acoustic modeling; unsupervised training;
D O I
10.4316/AECE.2015.01009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Statistical speech and language processing techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition. For high-resourced, international languages this data is widely available, while for under-resourced languages the lack of data poses serious problems. Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic model for any under-resourced language. This study describes a novel unsupervised acoustic model training method and evaluates it on speech data in an under-resourced language: Romanian. The key novel factor of the method is the usage of two complementary seed ASR systems to produce high quality transcriptions, with a Character Error Rate (ChER) < 5%, for initially untranscribed speech data. The methodology leads to a relative Word Error Rate (WER) improvement of more than 10% when 100 hours of untranscribed speech are used.
引用
收藏
页码:63 / 68
页数:6
相关论文
共 18 条
  • [1] [Anonymous], 2013, P INTERSPEECH
  • [2] [Anonymous], 2009, P INTERSPEECH
  • [3] [Anonymous], 2014, P SLTU
  • [4] Besacier L., SPEECH COMMUNICATION, V56, P85
  • [5] Buzo A., 2013, P INT C SPEECH TECHN, P77
  • [6] Cucu H, 2011, THESIS U POLITEHNICA
  • [7] Cucu H., 2014, P 10 INT C COMMUNICA, P111
  • [8] Fraga-Silva T, 2011, INT CONF ACOUST SPEE, P4656
  • [9] Kemp T., 1999, PROC EUR C SPEECH CO, P2725
  • [10] Lightly supervised and unsupervised acoustic model training
    Lamel, L
    Gauvain, JL
    Adda, G
    [J]. COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01) : 115 - 129