An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation

被引:0
|
作者
Qin, Yi [1 ]
Yu, Feifan [2 ]
机构
[1] Xian Univ Sci & Technol, Coll Comp Sci & Technol, Xian 710054, Peoples R China
[2] SHCCIG Yubei Coal Ind Co Ltd, Xian 710900, Peoples R China
关键词
dialect speech recognition; coal mining industry; end to end; Conformer model; Transformer model; Connectionist Temporal Classification (CTC);
D O I
10.3390/s25020341
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture. To tailor the model to the coal mining context, we developed a specialized corpus reflecting the phonetic characteristics of the dialect and its usage in the industry. We investigated feature extraction techniques suitable for the North Shaanxi dialect, focusing on the unique pronunciation of initial consonants and vowels. A preprocessing module was designed to accommodate the dialect's rapid speech tempo and polyphonic nature, enhancing recognition performance. To enhance the decoder's text generation capability, we replaced the Conformer decoder with a Transformer architecture. Additionally, to mitigate the computational demands of the model, we incorporated Connectionist Temporal Classification (CTC) joint training for optimization. The experimental results on our self-established voice dataset for the Northern Shaanxi coal mining industry demonstrate that the proposed Conformer-Transformer-CTC model achieves a 9.2% and 10.3% reduction in the word error rate compared to the standalone Conformer and Transformer models, respectively, confirming the advancement of our method. The next step will involve researching how to improve the performance of dialect speech recognition by integrating external language models and extracting pronunciation features of different dialects, thereby achieving better recognition results.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SPEECH RECOGNITION MODEL FOR ATYPICAL SPEECH
    Doshi, Rohan
    Chen, Youzheng
    Jiang, Liyang
    Zhang, Xia
    Biadsy, Fadi
    Ramabhadran, Bhuvana
    Chu, Fang
    Rosenberg, Andrew
    Moreno, Pedro J.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6988 - 6992
  • [22] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [23] End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
    Kimura, Naoki
    Su, Zixiong
    Saeki, Takaaki
    INTERSPEECH 2020, 2020, : 1025 - 1026
  • [24] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
    Sun, Jingwen
    Zhou, Gang
    Yang, Hongwu
    Wang, Man
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
  • [25] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [26] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [27] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [28] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [29] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [30] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596