NICT-TIB1: A PUBLIC SPEECH CORPUS OF LHASA DIALECT FOR BENCHMARKING TIBETAN LANGUAGE SPEECH RECOGNITION SYSTEMS

被引:1
作者
Soky, Kak [1 ,3 ]
Gong, Zhuo [2 ,3 ]
Li, Sheng [3 ]
机构
[1] Kyoto Univ, Kyoto, Japan
[2] Univ Tokyo, Tokyo, Japan
[3] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
来源
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022) | 2022年
关键词
Speech recognition; Tibetan language; Lhasa dialect; low-resource data;
D O I
10.1109/O-COCOSDA202257103.2022.9997917
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Lhasa dialect is the primary Tibetan dialect, with the most speakers in Tibet and the most extensive written scripts over its lengthy history. Studying speech recognition methods in the Lhasa dialect significantly conserves Tibet's distinctive linguistic variety. Previous research on Tibetan speech recognition focused on academic research on non-public datasets, e.g., selecting phone-level acoustic modeling units and incorporating tonal information, but had less contribution to limited data for the community. To solve the low-resource data problem, we introduce the NICT-Tib1 (phase1) database, a new open-sourced database for the Lhasa dialect. We further update benchmark systems under the monolingual and multilingual settings, respectively. Experimental results show that the performances of these models are consistent with previous work. We believe our work will promote the existing speech recognition research on the Tibetan language, and other lowresource languages.
引用
收藏
页数:5
相关论文
共 30 条
[1]  
[Anonymous], 1983, TIBETO CHINESE LHASA
[2]  
Baevski A, 2020, PROC NEURLPS
[3]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[4]  
Chorowski J, 2015, ADV NEUR IN, V28
[5]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[6]  
Dong L, 2018, IEEE INT CON MULTI
[7]  
Graves A, 2014, PR MACH LEARN RES, V32, P1764
[8]  
Hadian H, 2018, INTERSPEECH, P12
[9]   Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM [J].
Hari, Takaaki ;
Watanabe, Shinji ;
Zhang, Yu ;
Chan, William .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :949-953
[10]  
Li J., 2016, PROC ISCSLP