RNN Language Model Estimation for Out-of-Vocabulary Words

被引:0
|
作者
Illina, Irina [1 ]
Fohr, Dominique [1 ]
机构
[1] Univ Lorraine, CNRS, INRIA, LORIA,MultiSpeech Team, F-54000 Nancy, France
来源
HUMAN LANGUAGE TECHNOLOGY. CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2017 | 2020年 / 12598卷
关键词
Speech recognition; Neural networks; Vocabulary extension; Out-of-vocabulary words; Proper names;
D O I
10.1007/978-3-030-66527-2_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM.
引用
收藏
页码:199 / 211
页数:13
相关论文
共 50 条
  • [21] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
    Javier Tejedo
    Simon King
    Joe Frankel
    Journal of Computer Science & Technology, 2012, 27 (02) : 358 - 375
  • [22] Confidence measure based on forced-alignment for out-of-vocabulary term detection
    Han, J. (jqhan@hit.edu.com), 2013, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09): : 9699 - 9705
  • [23] Improving the Performance of Out-of-vocabulary Word Rejection by Using Support Vector Machines
    Huang Shilei
    Xie Xiang
    Kuang Jingming
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1618 - 1621
  • [24] Combined low level and high level features for Out-Of-Vocabulary Word detection
    Lecouteux, Benjamin
    Linares, Georges
    Favre, Benoit
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1199 - +
  • [25] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    Tejedor, Javier
    King, Simon
    Frankel, Joe
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (02) : 358 - 375
  • [26] Word-Graph-based Handwriting Keyword Spotting of Out-of-Vocabulary Queries
    Puigcerver, Joan
    Hector Toselli, Alejandro
    Vidal, Enrique
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2035 - 2040
  • [27] Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection
    Dong Wang
    Javier Tejedor
    Simon King
    Joe Frankel
    Journal of Computer Science and Technology, 2012, 27 : 358 - 375
  • [28] A phoneme-based approach for eliminating out-of-vocabulary problem of Turkish speech recognition using Hidden Markov Model
    Yavuz, Erdem
    Topuz, Vedat
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2018, 33 (06): : 429 - 445
  • [29] CRF-based Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
    Wang, Dong
    King, Simon
    Evans, Nicholas
    Troncy, Raphael
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1668 - +
  • [30] OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM
    Egorova, Ekaterina
    Burget, Lukas
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5919 - 5923