Building a Speech Dataset and Recognition Model for the Minority Tu Language

被引:0
作者
Kong, Shasha [1 ]
Li, Chunmei [1 ]
Fang, Chengwu [1 ]
Yang, Peng [1 ]
机构
[1] Qinghai Univ, Coll Comp Technol & Applicat, Xining 810016, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
基金
中国国家自然科学基金;
关键词
automatic speech recognition; low-resource language; conformer; Tu language;
D O I
10.3390/app14156795
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speech recognition technology has many applications in our daily life. However, for many low-resource languages without written forms, acquiring sufficient training data remains a significant challenge for building accurate ASR models. The Tu language, spoken by an ethnic minority group in Qinghai Province in China, is one such example. Due to the lack of written records and the great diversity in regional pronunciations, there has been little previous research on Tu-language speech recognition. This work seeks to address this research gap by creating the first speech dataset for the Tu language spoken in Huzhu County, Qinghai. We first formulated the relevant pronunciation rules for the Tu language based on linguistic analysis. Then, we constructed a new speech corpus, named HZ-TuDs, through targeted data collection and annotation. Based on the HZ-TuDs dataset, we designed several baseline sequence-to-sequence deep neural models for end-to-end Tu-language speech recognition. Additionally, we proposed a novel SA-conformer model, which combines convolutional and channel attention modules to better extract speech features. Experiments showed that our proposed SA-conformer model can significantly reduce the character error rate from 23% to 12%, effectively improving the accuracy of Tu language recognition compared to previous approaches. This demonstrates the effectiveness of our dataset construction and model design efforts in advancing speech recognition technology for this low-resource minority language.
引用
收藏
页数:11
相关论文
共 26 条
[1]  
Agarap A.F., 2018, arXiv
[2]  
Amodei D, 2016, PR MACH LEARN RES, V48
[3]  
[Anonymous], About us
[4]   Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet [J].
Dhakal, Manish ;
Chhetri, Arman ;
Gupta, Aman Kumar ;
Lamichhane, Prabin ;
Pandey, Suraj ;
Shakya, Subarna .
2022 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES, ICICT 2022, 2022, :515-521
[5]  
Freitag M, 2017, Arxiv, DOI arXiv:1702.01806
[6]  
Genxiong J., 2011, Chin. Mong. Stud. (Mong.), V39, P6
[7]  
Graves A., 2006, P 23 INT C MACH LEAR, P369, DOI 10.1145/1143844.1143891
[8]  
Gulati A, 2020, Arxiv, DOI arXiv:2005.08100
[9]  
Haixia W., 2013, Value Eng, V32, P2
[10]   Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2 [J].
Hassan, Muhammad Ahmed ;
Rehmat, Asim ;
Ghani Khan, Muhammad Usman ;
Yousaf, Muhammad Haroon .
MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022