Cascade Speech Translation for the Kazakh Language

被引:4
作者
Kozhirbayev, Zhanibek [1 ]
Islamgozhayev, Talgat [1 ]
机构
[1] Nazarbayev Univ, Natl Lab Astana, Astana 010000, Kazakhstan
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 15期
关键词
cascade speech translation; Kazakh language; Russian language; automatic speech recognition; machine translation; cross-lingual communication; RECOGNITION;
D O I
10.3390/app13158900
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speech translation systems have become indispensable in facilitating seamless communication across language barriers. This paper presents a cascade speech translation system tailored specifically for translating speech from the Kazakh language to Russian. The system aims to enable effective cross-lingual communication between Kazakh and Russian speakers, addressing the unique challenges posed by these languages. To develop the cascade speech translation system, we first created a dedicated speech translation dataset ST-kk-ru based on the ISSAI Corpus. The ST-kk-ru dataset comprises a large collection of Kazakh speech recordings along with their corresponding Russian translations. The automatic speech recognition (ASR) module of the system utilizes deep learning techniques to convert spoken Kazakh input into text. The machine translation (MT) module employs state-of-the-art neural machine translation methods, leveraging the parallel Kazakh-Russian translations available in the dataset to generate accurate translations. By conducting extensive experiments and evaluations, we have thoroughly assessed the performance of the cascade speech translation system on the ST-kk-ru dataset. The outcomes of our evaluation highlight the effectiveness of incorporating additional datasets for both the ASR and MT modules. This augmentation leads to a significant improvement in the performance of the cascade speech translation system, increasing the BLEU score by approximately 2 points when translating from Kazakh to Russian. These findings underscore the importance of leveraging supplementary data to enhance the capabilities of speech translation systems.
引用
收藏
页数:17
相关论文
共 40 条
[1]  
[Anonymous], 2005, P INTERSPEECH, DOI DOI 10.21437/INTERSPEECH.2005-727
[2]  
Di Gangi MA, 2019, Arxiv, DOI arXiv:1910.10238
[3]  
Baevski A, 2020, ADV NEUR IN, V33
[4]  
Bahar P, 2020, 17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), P44
[5]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]
[6]  
Balzhan A., 2015, P INT C TURKIC LANGU, P5
[7]  
Bentivogli L., 2016, P LREC 2016 WORKSH T, P14
[8]  
Bentivogli L, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P2873
[9]  
Bojar O., 2016, LREC 2016 WORKSHOP T, P27
[10]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621