Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

被引:0
|
作者
Oh, Junseok [1 ]
Cho, Eunsoo [2 ]
Kim, Ji-Hwan [1 ]
机构
[1] Sogang Univ, Dept Comp Sci & Engn, 35 Baekbeom Ro, Seoul 04107, South Korea
[2] SELVAS AI, Speech Recognit Lab, 20F,19 Gasan Digital 1-Ro, Seoul 08594, South Korea
来源
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2024年 / 18卷 / 06期
关键词
Connectionist Temporal Classification; Shallow Fusion; External Language Model; End-to-end Automatic Speech Recognition; Weighted Finite-State Transducer;
D O I
10.3837/tiis.2024.06.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre -trained Korean End -to -end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause -level n -gram language model, transformed into a Weighted Finite -State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre -trained ASR model. This is particularly advantageous for Korean, characterized as a low -resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n -gram model at the clause -level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.
引用
收藏
页码:1693 / 1706
页数:14
相关论文
共 50 条
  • [31] TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
    Yang, Ziqing
    Cui, Yiming
    Chen, Zhigang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2022, : 35 - 43
  • [32] Learning and Evaluating a Differentially Private Pre-trained Language Model
    Hoory, Shlomo
    Feder, Amir
    Tendler, Avichai
    Cohen, Alon
    Erell, Sofia
    Laish, Itay
    Nakhost, Hootan
    Stemmer, Uri
    Benjamini, Ayelet
    Hassidim, Avinatan
    Matias, Yossi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1178 - 1189
  • [33] Idiom Cloze Algorithm Integrating with Pre-trained Language Model
    Ju S.-G.
    Huang F.-Y.
    Sun J.-P.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (10): : 3793 - 3805
  • [34] A teacher action recognition model based on pre-trained language and video model
    Luo, Sen
    Zhou, Juxiang
    Wen, Xiaoyu
    Li, Hao
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON EDUCATION TECHNOLOGY AND COMPUTERS, ICETC 2023, 2023, : 335 - 340
  • [35] SPEECHCLIP: INTEGRATING SPEECH WITH PRE-TRAINED VISION AND LANGUAGE MODEL
    Shih, Yi-Jen
    Wang, Hsuan-Fu
    Chang, Heng-Jui
    Berry, Layne
    Lee, Hung-yi
    Harwath, David
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 715 - 722
  • [36] AraXLNet: pre-trained language model for sentiment analysis of Arabic
    Alhanouf Alduailej
    Abdulrahman Alothaim
    Journal of Big Data, 9
  • [37] Chinese-Korean Weibo Sentiment Classification Based on Pre-trained Language Model and Transfer Learning
    Wang, Hengxuan
    Zhang, Zhenguo
    Cui, Xu
    Cui, Rongyi
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 49 - 54
  • [38] K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce
    Xu, Song
    Li, Haoran
    Yuan, Peng
    Wang, Yujia
    Wu, Youzheng
    He, Xiaodong
    Liu, Ying
    Zhou, Bowen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1 - 17
  • [39] A pre-trained BERT for Korean medical natural language processing
    Kim, Yoojoong
    Kim, Jong-Ho
    Lee, Jeong Moon
    Jang, Moon Joung
    Yum, Yun Jin
    Kim, Seongtae
    Shin, Unsub
    Kim, Young-Min
    Joo, Hyung Joon
    Song, Sanghoun
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [40] A pre-trained BERT for Korean medical natural language processing
    Yoojoong Kim
    Jong-Ho Kim
    Jeong Moon Lee
    Moon Joung Jang
    Yun Jin Yum
    Seongtae Kim
    Unsub Shin
    Young-Min Kim
    Hyung Joon Joo
    Sanghoun Song
    Scientific Reports, 12