Neurorecognition visualization in multitask end-to-end speech

被引:0
作者
Mamyrbayev, Orken [1 ]
Pavlov, Sergii [2 ]
Bekarystankyzy, Akbayan [3 ,4 ]
Oralbekova, Dina [5 ]
Zhumazhanov, Bagashar [1 ]
Azarova, Larysa
Mussayeva, Dinara [6 ]
Koval, Tetiana [7 ]
Gromaszek, Konrad [8 ]
Issimov, Nurdaulet [9 ]
Shiyapov, Kadrzhan [10 ]
机构
[1] Inst Informat & Computat Technol, 28 Shevchenko St, Alma Ata 050010, Kazakhstan
[2] Vinnytsia Natl Tech Univ, Khmelnytske Hwy,95, UA-21000 Vinnytsia, Ukraine
[3] Satbayev Univ, Satpaev St 22, Alma Ata 050000, Kazakhstan
[4] Narxoz Univ, Zhandossov St 55, Alma Ata 050035, Kazakhstan
[5] Almaty Univ Power Engn & Telecommun, Baytursynuli St 126-1, Alma Ata 050013, Kazakhstan
[6] Inst Econ CS MES RK, Kurmangazy St 29,A25K1B0, Alma Ata, Kazakhstan
[7] Vinnytsia Mykhailo Kotsiubynskyi State Pedag Univ, Ostrozkoho St 32, UA-21100 Vinnytsia, Ukraine
[8] Lublin Univ Technol, Ul Nadbystrzycka 38D, PL-20618 Lublin, Poland
[9] Turan Univ, Satpayeva St 16a, Alma Ata 050013, Kazakhstan
[10] Abai Kazakh Natl Pedag Univ, Dostyk Ave 13, Alma Ata 050010, Kazakhstan
来源
OPTICAL FIBERS AND THEIR APPLICATIONS 2023 | 2024年 / 12985卷
关键词
end-to-end; multitask training; speech recognition; speaker identification; dialect identification; NEURAL-NETWORKS;
D O I
10.1117/12.3022727
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Nowadays, speech-processing technologies with different language systems are successfully used in mobile and stationary devices. Kazakh is considered a low-resource language, which poses various challenges for conventional speech recognition methods. This paper presents a proposed model capable of multitasking and handling concurrent speech recognition, dialect identification, and speaker identification, all in an end-to-end framework. The developed multitask model enables training three different tasks within a single model. A multitask recognition system is created based on the WaveNet-CTC model. Experiments show that for the concrete task end-to-end multitask model has better performance than other models..
引用
收藏
页数:8
相关论文
共 30 条
[1]  
Bisikalo O., 2023, Entropy., V25, P1
[2]  
Cai L., 2008, Gansu Sci. Technol, V24, P46
[3]   Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition [J].
Chen, Dongpeng ;
Mak, Brian Kan-Wing .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) :1172-1183
[4]   Automatic Speech Recognition for Uyghur, Kazakh, and Kyrgyz: An Overview [J].
Du, Wenqiang ;
Maimaitiyiming, Yikeremu ;
Nijat, Mewlude ;
Li, Lantian ;
Hamdulla, Askar ;
Wang, Dong .
APPLIED SCIENCES-BASEL, 2023, 13 (01)
[5]  
Han Q., 2010, Softw. Guide., V9, P173
[6]   End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning [J].
Imaizumi, Ryo ;
Masumura, Ryo ;
Shiota, Sayaka ;
Kiya, Hitoshi .
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
[7]  
Khassanov Y, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P697
[8]  
Kim S, 2017, INT CONF ACOUST SPEE, P4835, DOI 10.1109/ICASSP.2017.7953075
[9]   IDENTIFICATION OF CLADDING MODES IN SMF-28 FIBERS WITH TFBG STRUCTURES [J].
Kisala, Piotr ;
Kalizhanova, Aliy ;
Kozbakova, Ainur ;
Yeraliyeva, Bakhyt .
METROLOGY AND MEASUREMENT SYSTEMS, 2023, 30 (03) :507-518
[10]  
Krishna K., 2018, Hierarchical multitask learning for CTC-based speech recognition, P543