Dirichlet Class Language Models for Speech Recognition

被引:46
|
作者
Chien, Jen-Tzung [1 ]
Chueh, Chuang-Hua [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 03期
关键词
Bayes procedure; clustering method; natural language; smoothing method; speech recognition;
D O I
10.1109/TASL.2010.2050717
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Latent Dirichlet allocation (LDA) was successfully developed for document modeling due to its generalization to unseen documents through the latent topic modeling. LDA calculates the probability of a document based on the bag-of-words scheme without considering the order of words. Accordingly, LDA cannot be directly adopted to predict words in speech recognition systems. This work presents a new Dirichlet class language model (DCLM), which projects the sequence of history words onto a latent class space and calculates a marginal likelihood over the uncertainties of classes, which are expressed by Dirichlet priors. A Bayesian class-based language model is established and a variational Bayesian procedure is presented for estimating DCLM parameters. Furthermore, the long-distance class information is continuously updated using the large-span history words and is dynamically incorporated into class mixtures for a cache DCLM. Different language models are experimentally evaluated using the Wall Street Journal (WSJ) corpus. The amount of training data and the size of vocabulary are evaluated. We find that the cache DCLM effectively characterizes the unseen n-gram events and stores the class information for long-distance language modeling. This approach out-performs the other class-based and topic-based language models in terms of perplexity and recognition accuracy. The DCLM and cache DCLM achieved relative gain of word error rate by 3% to 5% over the LDA topic-based language model with different sizes of training data.
引用
收藏
页码:482 / 495
页数:14
相关论文
共 50 条
  • [1] Factored Language Model Adaptation Using Dirichlet Class Language Model for Speech Recognition
    Hatami, Ali
    Akbari, Ahmad
    Nasersharif, Babak
    2013 5TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2013, : 438 - 442
  • [2] TOPIC CACHE LANGUAGE MODEL FOR SPEECH RECOGNITION
    Chueh, Chuang-Hua
    Chien, Jen-Tzung
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5194 - 5197
  • [3] N-gram Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech for Speech Recognition
    Hatami, Ali
    Akbari, Ahmad
    Nasersharif, Babak
    2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,
  • [4] Improving language models for radiology speech recognition
    Paulett, John M.
    Langlotz, Curtis P.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58
  • [5] GEOGRAPHIC LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Xiao, Xiaoqiang
    Chen, Hong
    Zylak, Mark
    Sosa, Daniela
    Desu, Suma
    Krishnamoorthy, Mahesh
    Liu, Daben
    Paulik, Matthias
    Zhang, Yuchen
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6124 - 6128
  • [6] BAYESIAN TRANSFORMER LANGUAGE MODELS FOR SPEECH RECOGNITION
    Xue, Boyang
    Yu, Jianwei
    Xu, Junhao
    Liu, Shansong
    Hu, Shoukang
    Ye, Zi
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7378 - 7382
  • [7] PROMPTING LARGE LANGUAGE MODELS WITH SPEECH RECOGNITION ABILITIES
    Fathullah, Yassir
    Wu, Chunyang
    Lakomkin, Egor
    Jia, Junteng
    Shangguan, Yuan
    Li, Ke
    Guo, Jinxi
    Xiong, Wenhan
    Mahadeokar, Jay
    Kalinli, Ozlem
    Fuegen, Christian
    Seltzer, Mike
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 13351 - 13355
  • [8] Dual Language Models for Code Switched Speech Recognition
    Garg, Saurabh
    Parekh, Tanmay
    Jyothi, Preethi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2598 - 2602
  • [9] DOCUMENT-BASED DIRICHLET CLASS LANGUAGE MODEL FOR SPEECH RECOGNITION USING DOCUMENT-BASED N-GRAM EVENTS
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 42 - 47
  • [10] Automatic Speech Recognition for Irish: testing lexicons and language models
    Qian, Mengjie
    Berthelsen, Harald
    Lonergan, Liam
    Murphy, Andy
    O'Neill, Claire
    Chiarain, Neasa Ni
    Gobl, Christer
    Chasaide, Ailbhe Ni
    2022 33RD IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2022,