Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

被引：0

作者：

Oh, Junseok ^{[1
]}

Cho, Eunsoo ^{[2
]}

Kim, Ji-Hwan ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, 35 Baekbeom Ro, Seoul 04107, South Korea

[2] SELVAS AI, Speech Recognit Lab, 20F,19 Gasan Digital 1-Ro, Seoul 08594, South Korea

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2024年 / 18卷 / 06期

关键词：

Connectionist Temporal Classification; Shallow Fusion; External Language Model; End-to-end Automatic Speech Recognition; Weighted Finite-State Transducer;

D O I：

10.3837/tiis.2024.06.015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre -trained Korean End -to -end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause -level n -gram language model, transformed into a Weighted Finite -State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre -trained ASR model. This is particularly advantageous for Korean, characterized as a low -resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n -gram model at the clause -level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

引用

页码：1693 / 1706

页数：14

共 50 条

[1] REGARDING TOPOLOGY AND ADAPTABILITY IN DIFFERENTIABLE WFST-BASED E2E ASR
Zhao, Zeyu
Chen, Pinzhen
Bell, Peter
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 843 - 847
[2] ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding
Wang, Chengyu
Dai, Suyang
Wang, Yipeng
Yang, Fei
Qiu, Minghui
Chen, Kehan
Zhou, Wei
Huang, Jun
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1207 - 1218
[3] Entropy-Based Dynamic Rescoring with Language Model in E2E ASR Systems
Gong, Zhuo
Saito, Daisuke
Minematsu, Nobuaki
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[4] Hyperbolic Pre-Trained Language Model
Chen, Weize
Han, Xu
Lin, Yankai
He, Kaichen
Xie, Ruobing
Zhou, Jie
Liu, Zhiyuan
Sun, Maosong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
[5] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[6] Adder Encoder for Pre-trained Language Model
Ding, Jianbang
Zhang, Suiyun
Li, Linlin
CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
[7] Drug-BERT : Pre-trained Language Model Specialized for Korean Drug Crime
Lee, Jeong Min
Lee, Suyeon
Byon, Sungwon
Jung, Eui-Suk
Baek, Myung-Sun
19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024, 2024, : 186 - 188
[8] Surgicberta: a pre-trained language model for procedural surgical language
Bombieri, Marco
Rospocher, Marco
Ponzetto, Simone Paolo
Fiorini, Paolo
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
[9] Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus
Peyser, Cal
Mavandadi, Sepand
Sainath, Tara N.
Apfel, James
Pang, Ruoming
Kumar, Shankar
INTERSPEECH 2020, 2020, : 4921 - 4925
[10] Pre-trained Language Model for Biomedical Question Answering
Yoon, Wonjin
Lee, Jinhyuk
Kim, Donghyeon
Jeong, Minbyul
Kang, Jaewoo
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740

← 1 2 3 4 5 →