Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

被引：0

作者：

Oh, Junseok ^{[1
]}

Cho, Eunsoo ^{[2
]}

Kim, Ji-Hwan ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, 35 Baekbeom Ro, Seoul 04107, South Korea

[2] SELVAS AI, Speech Recognit Lab, 20F,19 Gasan Digital 1-Ro, Seoul 08594, South Korea

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2024年 / 18卷 / 06期

关键词：

Connectionist Temporal Classification; Shallow Fusion; External Language Model; End-to-end Automatic Speech Recognition; Weighted Finite-State Transducer;

D O I：

10.3837/tiis.2024.06.015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre -trained Korean End -to -end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause -level n -gram language model, transformed into a Weighted Finite -State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre -trained ASR model. This is particularly advantageous for Korean, characterized as a low -resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n -gram model at the clause -level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

引用

页码：1693 / 1706

页数：14

共 50 条

[31] TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
Yang, Ziqing
Cui, Yiming
Chen, Zhigang
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2022, : 35 - 43
[32] Learning and Evaluating a Differentially Private Pre-trained Language Model
Hoory, Shlomo
Feder, Amir
Tendler, Avichai
Cohen, Alon
Erell, Sofia
Laish, Itay
Nakhost, Hootan
Stemmer, Uri
Benjamini, Ayelet
Hassidim, Avinatan
Matias, Yossi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1178 - 1189
[33] Idiom Cloze Algorithm Integrating with Pre-trained Language Model
Ju S.-G.
Huang F.-Y.
Sun J.-P.
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (10): : 3793 - 3805
[34] A teacher action recognition model based on pre-trained language and video model
Luo, Sen
Zhou, Juxiang
Wen, Xiaoyu
Li, Hao
PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON EDUCATION TECHNOLOGY AND COMPUTERS, ICETC 2023, 2023, : 335 - 340
[35] SPEECHCLIP: INTEGRATING SPEECH WITH PRE-TRAINED VISION AND LANGUAGE MODEL
Shih, Yi-Jen
Wang, Hsuan-Fu
Chang, Heng-Jui
Berry, Layne
Lee, Hung-yi
Harwath, David
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 715 - 722
[36] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alhanouf Alduailej
Abdulrahman Alothaim
Journal of Big Data, 9
[37] Chinese-Korean Weibo Sentiment Classification Based on Pre-trained Language Model and Transfer Learning
Wang, Hengxuan
Zhang, Zhenguo
Cui, Xu
Cui, Rongyi
2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 49 - 54
[38] K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce
Xu, Song
Li, Haoran
Yuan, Peng
Wang, Yujia
Wu, Youzheng
He, Xiaodong
Liu, Ying
Zhou, Bowen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1 - 17
[39] A pre-trained BERT for Korean medical natural language processing
Kim, Yoojoong
Kim, Jong-Ho
Lee, Jeong Moon
Jang, Moon Joung
Yum, Yun Jin
Kim, Seongtae
Shin, Unsub
Kim, Young-Min
Joo, Hyung Joon
Song, Sanghoun
SCIENTIFIC REPORTS, 2022, 12 (01)
[40] A pre-trained BERT for Korean medical natural language processing
Yoojoong Kim
Jong-Ho Kim
Jeong Moon Lee
Moon Joung Jang
Yun Jin Yum
Seongtae Kim
Unsub Shin
Young-Min Kim
Hyung Joon Joo
Sanghoun Song
Scientific Reports, 12

← 1 2 3 4 5 →