Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

被引:0
|
作者
Oh, Junseok [1 ]
Cho, Eunsoo [2 ]
Kim, Ji-Hwan [1 ]
机构
[1] Sogang Univ, Dept Comp Sci & Engn, 35 Baekbeom Ro, Seoul 04107, South Korea
[2] SELVAS AI, Speech Recognit Lab, 20F,19 Gasan Digital 1-Ro, Seoul 08594, South Korea
来源
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2024年 / 18卷 / 06期
关键词
Connectionist Temporal Classification; Shallow Fusion; External Language Model; End-to-end Automatic Speech Recognition; Weighted Finite-State Transducer;
D O I
10.3837/tiis.2024.06.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre -trained Korean End -to -end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause -level n -gram language model, transformed into a Weighted Finite -State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre -trained ASR model. This is particularly advantageous for Korean, characterized as a low -resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n -gram model at the clause -level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.
引用
收藏
页码:1693 / 1706
页数:14
相关论文
共 50 条
  • [1] REGARDING TOPOLOGY AND ADAPTABILITY IN DIFFERENTIABLE WFST-BASED E2E ASR
    Zhao, Zeyu
    Chen, Pinzhen
    Bell, Peter
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 843 - 847
  • [2] ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding
    Wang, Chengyu
    Dai, Suyang
    Wang, Yipeng
    Yang, Fei
    Qiu, Minghui
    Chen, Kehan
    Zhou, Wei
    Huang, Jun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1207 - 1218
  • [3] Entropy-Based Dynamic Rescoring with Language Model in E2E ASR Systems
    Gong, Zhuo
    Saito, Daisuke
    Minematsu, Nobuaki
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [4] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [5] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [6] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [7] Drug-BERT : Pre-trained Language Model Specialized for Korean Drug Crime
    Lee, Jeong Min
    Lee, Suyeon
    Byon, Sungwon
    Jung, Eui-Suk
    Baek, Myung-Sun
    19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024, 2024, : 186 - 188
  • [8] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [9] Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus
    Peyser, Cal
    Mavandadi, Sepand
    Sainath, Tara N.
    Apfel, James
    Pang, Ruoming
    Kumar, Shankar
    INTERSPEECH 2020, 2020, : 4921 - 4925
  • [10] Pre-trained Language Model for Biomedical Question Answering
    Yoon, Wonjin
    Lee, Jinhyuk
    Kim, Donghyeon
    Jeong, Minbyul
    Kang, Jaewoo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740