Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh

被引:0
作者
Orel, Daniil [1 ]
Yeshpanov, Rustem [1 ]
Varol, Huseyin Atakan [2 ]
机构
[1] Nazarbayev Univ, Inst Smart Syst & Artificial Intelligence, Astana, Kazakhstan
[2] Nazarbayev Univ, Sch Engn & Digital Sci, Inst Smart Syst & Artificial Intelligence, Astana, Kazakhstan
来源
2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP | 2023年
关键词
automatic speech recognition; cross-lingual transfer learning; deep learning; Turkic languages;
D O I
10.1109/BigComp57234.2023.00037
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper investigates the effectiveness of transfer learning in building automatic speech recognition models for nine Turkic languages (Azerbaijani, Bashkir, Chuvash, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek), by leveraging large-scale training data from the Kazakh language. The performance of the models built using transfer learning from Kazakh was compared with the performance of the models for three non-Turkic languages (Indonesian, Japanese, and Swedish) to which transfer learning from Kazakh was also applied. We also compared the performance of the models with the results of models trained on English data. A total of 64 models were created. Most of the models built using transfer learning from Kazakh performed better than the monolingual baselines, with the most notable improvement observed for the Sakha model, which achieved a 45.5% and 22.8% reduction in the word error rate and character error rate on the test set, respectively. The datasets and codes used to train the models are available for download from https://github.com/IS2AI/CLTL Turkic ASR.
引用
收藏
页码:174 / 182
页数:9
相关论文
共 47 条
[1]  
Abbasov A., 2010, PROC INT C PROBLEMS, P23
[2]  
Abulimiti A, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P6444
[3]  
Agarap AF, 2018, arXiv
[4]  
[Anonymous], 2009, International Journal of Computer Science and Information Security, DOI DOI 10.1109/PROC.1976.10158
[5]  
[Anonymous], 2006, P 23 INT C MACH LEAR, DOI DOI 10.1145/1143844.1143891
[6]  
Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218
[7]   Turkish Broadcast News Transcription and Retrieval [J].
Arisoy, Ebru ;
Can, Dogan ;
Parlak, Siddika ;
Sak, Hasim ;
Saraclar, Murat .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05) :874-883
[8]  
Arkhangorodsky A, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P195
[9]  
Berrebbi D, 2022, Arxiv, DOI arXiv:2204.02470
[10]  
Campbell G.L., 2020, COMPENDIUM WORLDS LA