Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

被引：0

作者：

Oh, Junseok ^{[1
]}

Cho, Eunsoo ^{[2
]}

Kim, Ji-Hwan ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, 35 Baekbeom Ro, Seoul 04107, South Korea

[2] SELVAS AI, Speech Recognit Lab, 20F,19 Gasan Digital 1-Ro, Seoul 08594, South Korea

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2024年 / 18卷 / 06期

关键词：

Connectionist Temporal Classification; Shallow Fusion; External Language Model; End-to-end Automatic Speech Recognition; Weighted Finite-State Transducer;

D O I：

10.3837/tiis.2024.06.015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre -trained Korean End -to -end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause -level n -gram language model, transformed into a Weighted Finite -State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre -trained ASR model. This is particularly advantageous for Korean, characterized as a low -resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n -gram model at the clause -level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

引用

页码：1693 / 1706

页数：14

共 50 条

[21] ConfliBERT: A Pre-trained Language Model for Political Conflict and Violence
Hu, Yibo
Hosseini, MohammadSaleh
Parolin, Erick Skorupa
Osorio, Javier
Khan, Latifur
Brandt, Patrick T.
D'Orazio, Vito J.
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5469 - 5482
[22] IndicBART: A Pre-trained Model for Indic Natural Language Generation
Dabre, Raj
Shrotriya, Himani
Kunchukuttan, Anoop
Puduppully, Ratish
Khapra, Mitesh M.
Kumar, Pratyush
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1849 - 1863
[23] Pre-trained Language Model based Ranking in Baidu Search
Zou, Lixin
Zhang, Shengqiang
Cai, Hengyi
Ma, Dehong
Cheng, Suqi
Wang, Shuaiqiang
Shi, Daiting
Cheng, Zhicong
Yin, Dawei
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4014 - 4022
[24] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Shon, Suwon
Brusco, Pablo
Pan, Jing
Han, Kyu J.
Watanabe, Shinji
INTERSPEECH 2021, 2021, : 3420 - 3424
[25] Software Vulnerabilities Detection Based on a Pre-trained Language Model
Xu, Wenlin
Li, Tong
Wang, Jinsong
Duan, Haibo
Tang, Yahui
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 904 - 911
[26] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alduailej, Alhanouf
Alothaim, Abdulrahman
JOURNAL OF BIG DATA, 2022, 9 (01)
[27] A survey of text classification based on pre-trained language model
Wu, Yujia
Wan, Jun
NEUROCOMPUTING, 2025, 616
[28] Integrating Pre-Trained Language Model With Physical Layer Communications
Lee, Ju-Hyung
Lee, Dong-Ho
Lee, Joohan
Pujara, Jay
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (11) : 17266 - 17278
[29] SsciBERT: a pre-trained language model for social science texts
Shen, Si
Liu, Jiangfeng
Lin, Litao
Huang, Ying
Zhang, Lin
Liu, Chang
Feng, Yutong
Wang, Dongbo
SCIENTOMETRICS, 2023, 128 (02) : 1241 - 1263
[30] Interpretability of Entity Matching Based on Pre-trained Language Model
Liang Z.
Wang H.-Z.
Dai J.-J.
Shao X.-Y.
Ding X.-O.
Mu T.-Y.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03): : 1087 - 1108

← 1 2 3 4 5 →