A COMPARISON OF METHODS FOR OOV-WORD RECOGNITION ON A NEW PUBLIC DATASET

被引:1
作者
Braun, Rudolf A. [1 ]
Madikeri, Srikanth [1 ]
Motlicek, Petr [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
基金
欧盟地平线“2020”;
关键词
speech recognition; OOV-word recognition; speech dataset; finite-state transducers;
D O I
10.1109/ICASSP39728.2021.9415124
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a new tool for calculating relevant performance metrics. We then evaluate, within the context of a hybrid ASR system, how much better subword models are at recognizing OOVs, and how much benefit one can get from incorporating OOV-word information into an existing system by modifying WFSTs. Additionally, we propose a new method for modifying a subword-based language model so as to better recognize OOV-words. We showcase very large improvements in OOV-word recognition and make both the data and code available.
引用
收藏
页码:5979 / 5983
页数:5
相关论文
共 24 条
  • [1] Aleksic P, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P468
  • [2] Aleksic P, 2015, INT CONF ACOUST SPEE, P5172, DOI 10.1109/ICASSP.2015.7178957
  • [3] Allauzen C, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P2112
  • [4] Allauzen C, 2013, INTERSPEECH, P666
  • [5] Allauzen C, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P1191
  • [6] Alumae Asadullah Tanel, 2019, ADV RICH TRANSCRIPTI
  • [7] Ardila Rosana, 2020, Common voice: A massively-multilingual speech corpus
  • [8] Bazzi I, 2002, Modelling Out-of-Vocabulary Words for Robust Speech Recognition
  • [9] Bisani Maximilian, 2005, OPEN VOCABULARY SPEE, P725
  • [10] Bulusheva Anna, 2016, TSD