An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation

被引:0
|
作者
Qin, Yi [1 ]
Yu, Feifan [2 ]
机构
[1] Xian Univ Sci & Technol, Coll Comp Sci & Technol, Xian 710054, Peoples R China
[2] SHCCIG Yubei Coal Ind Co Ltd, Xian 710900, Peoples R China
关键词
dialect speech recognition; coal mining industry; end to end; Conformer model; Transformer model; Connectionist Temporal Classification (CTC);
D O I
10.3390/s25020341
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture. To tailor the model to the coal mining context, we developed a specialized corpus reflecting the phonetic characteristics of the dialect and its usage in the industry. We investigated feature extraction techniques suitable for the North Shaanxi dialect, focusing on the unique pronunciation of initial consonants and vowels. A preprocessing module was designed to accommodate the dialect's rapid speech tempo and polyphonic nature, enhancing recognition performance. To enhance the decoder's text generation capability, we replaced the Conformer decoder with a Transformer architecture. Additionally, to mitigate the computational demands of the model, we incorporated Connectionist Temporal Classification (CTC) joint training for optimization. The experimental results on our self-established voice dataset for the Northern Shaanxi coal mining industry demonstrate that the proposed Conformer-Transformer-CTC model achieves a 9.2% and 10.3% reduction in the word error rate compared to the standalone Conformer and Transformer models, respectively, confirming the advancement of our method. The next step will involve researching how to improve the performance of dialect speech recognition by integrating external language models and extracting pronunciation features of different dialects, thereby achieving better recognition results.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    INTERSPEECH 2022, 2022, : 1387 - 1391
  • [2] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
  • [3] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [4] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [5] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model
    Wang, Ding
    Ye, Shuaishuai
    Hu, Xinhui
    Li, Sheng
    Xu, Xinkang
    INTERSPEECH 2021, 2021, : 3266 - 3270
  • [6] Hybrid end-to-end model for Kazakh speech recognition
    Mamyrbayev O.Z.
    Oralbekova D.O.
    Alimhan K.
    Nuranbayeva B.M.
    International Journal of Speech Technology, 2023, 26 (02) : 261 - 270
  • [7] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3899 - 3903
  • [8] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
    Guo, Jinxi
    Sainath, Tara N.
    Weiss, Ron J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655
  • [9] MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
    Toshniwal, Shubham
    Sainath, Tara N.
    Weiss, Ron J.
    Li, Bo
    Moreno, Pedro
    Weinstein, Eugene
    Rao, Kanishka
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4904 - 4908
  • [10] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
    Luo, Lisai
    Li, Guanyu
    Gong, Chunwei
    Ding, Hailan
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187