Synthesising isiZulu-English code-switch bigrams using word embeddings

被引:11
作者
van der Westhuizen, Ewald [1 ]
Niesler, Thomas [1 ]
机构
[1] Stellenbosch Univ, Dept Elect & Elect Engn, Stellenbosch, South Africa
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
code-switching; word vectors; word embed-dings; Zulu; IsiZulu; spontaneous;
D O I
10.21437/Interspeech.2017-1437
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code-switching is prevalent among South African speakers, and presents a challenge to automatic speech recognition systems. It is predominantly a spoken phenomenon. and generally does not occur in textual form. Therefore a particularly serious challenge is the extreme lack of training material for language modelling. We investigate the use of word embeddings to synthesise isiZulu-to-English code-switch bigrams with which to augment such sparse language model training data. A variety of word embeddings are trained on a monolingual English web text corpus, and subsequently queried to synthesise code-switch bigrams. Our evaluation is performed on language models trained on a new, although small. English-isiZulu code switch corpus compiled from South African soap operas. This data is characterised by fast. spontaneously spoken speech containing frequent code-switching. We show that the augmentation of the training data with code-switched bigrams synthesised in this way leads to a reduction in perplexity.
引用
收藏
页码:72 / 76
页数:5
相关论文
共 23 条
  • [1] Adel H., 2013, ACL 2013
  • [2] Adel H., 2013, ICASSP 2013, pRecurrent neural network language modeling for code
  • [3] Syntactic and Semantic Features For Code-Switching Factored Language Models
    Adel, Heike
    Ngoc Thang Vu
    Kirchhoff, Katrin
    Telaar, Dominic
    Schultz, Tanja
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (03) : 431 - 440
  • [4] [Anonymous], 2013, P INT C LEARN REPR I
  • [5] Bhuvanagiri K. K., 2010, ORIENTAL COCOSDA 201
  • [6] Brown P. F., 1992, Computational Linguistics, V18, P467
  • [7] Chan J. Y, 2009, Computational Linguistics and Chinese Language Processing, V14, P281
  • [8] Franco JC, 2007, LECT NOTES COMPUT SC, V4394, P75
  • [9] Houwei Cao, 2010, Proceedings 7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010), P246, DOI 10.1109/ISCSLP.2010.5684900
  • [10] JOHANSSON S, MANUAL INFORM ACCOMP