RNN Language Model Estimation for Out-of-Vocabulary Words

被引:0
作者
Illina, Irina [1 ]
Fohr, Dominique [1 ]
机构
[1] Univ Lorraine, CNRS, INRIA, LORIA,MultiSpeech Team, F-54000 Nancy, France
来源
HUMAN LANGUAGE TECHNOLOGY. CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2017 | 2020年 / 12598卷
关键词
Speech recognition; Neural networks; Vocabulary extension; Out-of-vocabulary words; Proper names;
D O I
10.1007/978-3-030-66527-2_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM.
引用
收藏
页码:199 / 211
页数:13
相关论文
共 50 条
  • [31] OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM
    Egorova, Ekaterina
    Burget, Lukas
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5919 - 5923
  • [32] ON LANGUAGE MODEL INTEGRATION FOR RNN TRANSDUCER BASED SPEECH RECOGNITION
    Zhou, Wei
    Zheng, Zuoyun
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8407 - 8411
  • [33] A study of speech recognition based on RNN-RBM language model
    Li, Yaxiong, 1936, Science Press (51): : 1936 - 1944
  • [34] A Novel Approach Research on Chinese Language Model Fusion Based on RNN
    Liu, Hui
    Wang, Wei
    Wang, Long
    Zhao, Guang-lei
    INTERNATIONAL CONFERENCE ON OPTICS, ELECTRONICS AND COMMUNICATIONS TECHNOLOGY (OECT), 2017, 175 : 127 - 131
  • [35] RNN language model with word clustering and class-based output layer
    Yongzhe Shi
    Wei-Qiang Zhang
    Jia Liu
    Michael T Johnson
    EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [36] RNN language model with word clustering and class-based output layer
    Shi, Yongzhe
    Zhang, Wei-Qiang
    Liu, Jia
    Johnson, Michael T.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [37] ACCENT ESTIMATION OF JAPANESE WORDS FROM THEIR SURFACES AND ROMANIZATIONS FOR BUILDING LARGE VOCABULARY ACCENT DICTIONARIES
    Tachibana, Hideyuki
    Katayama, Yotaro
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8059 - 8063
  • [38] A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models
    Oualil, Youssef
    Klakow, Dietrich
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 264 - 268
  • [39] On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model
    Soutner, Daniel
    Zelinka, Jan
    Mueller, Ludek
    SPEECH AND COMPUTER, 2014, 8773 : 315 - 321
  • [40] Automatic Clustering of Part-of-speech for Vocabulary Divided PLSA Language Model
    Suzuki, Motoyuki
    Kuriyama, Naoto
    Ito, Akinori
    Makino, Shozo
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 289 - +