A Language Model Optimization Method for Turkish Automatic Speech Recognition System

被引:2
作者
Oyucu, Saadin [1 ]
Polat, Huseyin [2 ]
机构
[1] Adiyaman Univ, Fac Engn, Dept Comp Engn, Adiayaman, Turkey
[2] Gazi Univ, Fac Technol, Dept Comp Engn, Ankara, Turkey
来源
JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI | 2023年 / 26卷 / 03期
关键词
Turkish Automatic speech recognition; Turkish language model; Turkish language model score optimization; Turkish corpus; NEURAL-NETWORKS;
D O I
10.2339/politeknik.1085512
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The current Automatic Speech Recognition (ASR) modeling strategy still suffers from huge performance degradation when faced with languages with limited resources such as Turkish. Especially when the Language Model (LM) does not support the Acoustic Model (AM) sufficiently, the Word Error Rate (WER) increases. Therefore, a robust LM makes a strong contribution to improving ASR performance by generating word relations from the existing corpus. However, developing a robust language model is a challenging task due to the agglutinative nature of Turkish. Therefore, within the scope of the study, a sentence-level LM optimization method is proposed to improve the WER performance of Turkish ASR. In the proposed method, instead of a fixed word sequence obtained from the Markov assumptions, the probability of the word sequence forming a sentence was calculated. A method with n-gram and skip-gram properties is presented to obtain the word sequence probability. The proposed method has been tested on both statistical and Artificial Neural Network (ANN) based LMs. In the experiments carried out using, not only words but also sub-word level, two Turkish corpora (METU and Bogazici) shared via Linguistic Data Consortium (LDC) and a separate corpus, which we separate corpus that we specially created as HS was used. According to the experimental results obtained from statistical-based LM, 0.5% WER increases for the METU corpus, 1.6% WER decreases for the Bogazici corpus, and a 2.5% WER decrease for the HS corpus were observed. In the Feedforward Neural Networks (FNN) based LM, WER decreases were observed 0.2% for the METU corpus, 0.8% for the Bogazici corpus, and 1.6% for the HS corpus. Also, in the Recurrent Neural Network (RNN)-Long Short Term Memory (LSTM) based LM, WER decreases were observed 0.6% for METU corpus, 1.1% for the Bogazici corpus and 1.5% for the HS corpus. As a result, when the proposed method was applied to the LMs required for ASR, WER decreased, and the total performance of ASR increased.
引用
收藏
页码:1167 / 1178
页数:14
相关论文
共 66 条
[1]  
Ahmed A., 2017, ARXIV
[2]  
Ahmed A, 2012, P SAMPE INT S CHARL, P1
[3]  
Akin A.A., 2007, Structure, V10, P1
[4]  
Aksoylar Cem, 2009, 2009 IEEE 17th Signal Processing and Communications Applications Conference (SIU), P512, DOI 10.1109/SIU.2009.5136445
[5]  
[Anonymous], 2002, INTERSPEECH
[6]  
[Anonymous], 2004, IEEE SIGNAL PROCESSI
[7]  
Anusuya M.A., 2009, INT J COMPUTER SCI I, V6, P181, DOI DOI 10.1109/PROC.1976.10158
[8]   A unified language model for large vocabulary continuous speech recognition of Turkish [J].
Arisoy, Ebru ;
Dutagaci, Helin ;
Arslan, Levent M. .
SIGNAL PROCESSING, 2006, 86 (10) :2844-2862
[9]  
Arslan R., 2019, SUSTAINABILITY-BASEL, V11, P1
[10]  
Asefisaray B., 2018, THESIS U HACETTEPE A