A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

被引:0
|
作者
Ren J. [1 ,2 ]
Yang S. [1 ]
Shi Y. [3 ]
Yang J. [1 ]
机构
[1] Institute of Applied Artificial Intelligence of the Guangdong-Hong Kong-Macao Greater Bay Area, Shenzhen Polytechnic University, Guangdong, Shenzhen
[2] Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Guangdong, Shenzhen
[3] Industrial Training Centre, Shenzhen Polytechnic University, Guangdong, Shenzhen
基金
中国博士后科学基金;
关键词
Air traffic control communications; Algorithms and Analysis of Algorithms; Artificial Intelligence; Automatic speech recognition; Knowledge distillation; Lightweight ASR; Mandarin ASR; Model compression; Natural Language and Speech;
D O I
10.7717/PEERJ-CS.1650
中图分类号
学科分类号
摘要
Miscommunications between air traffic controllers (ATCOs) and pilots in air traffic control (ATC) may lead to catastrophic aviation accidents. Thanks to advances in speech and language processing, automatic speech recognition (ASR) is an appealing approach to prevent misunderstandings. To allow ATCOs and pilots sufficient time to respond instantly and effectively, the ASR systems for ATC must have both superior recognition performance and low transcription latency. However, most existing ASR works for ATC are primarily concerned with recognition performance while paying little attention to recognition speed, which motivates the research in this article. To address this issue, this article introduces knowledge distillation into the ASR for Mandarin ATC communications to enhance the generalization performance of the light model. Specifically, we propose a simple yet effective lightweight strategy, named Target-Swap Knowledge Distillation (TSKD), which swaps the logit output of the teacher and student models for the target class. It can mitigate the potential overconfidence of the teacher model regarding the target class and enable the student model to concentrate on the distillation of knowledge from non-target classes. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TSKD in homogeneous and heterogeneous architectures. The experimental results reveal that the generated lightweight ASR model achieves a balance between recognition accuracy and transcription latency. © 2023 Ren et al. All Rights Reserved.
引用
收藏
相关论文
共 48 条
  • [1] A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications
    Ren, Jin
    Yang, Shunzhi
    Shi, Yihua
    Yang, Jinfeng
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [2] Automatic Speech Recognition for Air Traffic Control Communications
    Badrinath, Sandeep
    Balakrishnan, Hamsa
    TRANSPORTATION RESEARCH RECORD, 2022, 2676 (01) : 798 - 810
  • [3] Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control
    Zhang, Shiyu
    Kong, Jianguo
    Chen, Chao
    Li, Yabin
    Liang, Haijun
    AEROSPACE, 2022, 9 (08)
  • [4] A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control
    Jiang, Peiyuan
    Pan, Weijun
    Zhang, Jian
    Wang, Teng
    Huang, Junxiang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 911 - 940
  • [5] Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
    Ge, Shuting
    Ren, Jin
    Shi, Yihua
    Zhang, Yujun
    Yang, Shunzhi
    Yang, Jinfeng
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (03): : 3215 - 3245
  • [6] Automatic Speech Recognition Benchmark for Air-Traffic Communications
    Zuluaga-Gomez, Juan
    Motlicek, Petr
    Zhan, Qingran
    Vesely, Karel
    Braun, Rudolf
    INTERSPEECH 2020, 2020, : 2297 - 2301
  • [7] MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition
    Wu, Xing
    Jin, Yifan
    Wang, Jianjia
    Qian, Quan
    Guo, Yike
    ALGORITHMS, 2022, 15 (05)
  • [8] Boosting Lightweight CNNs Through Network Pruning and Knowledge Distillation for SAR Target Recognition
    Wang, Zhen
    Du, Lan
    Li, Yi
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 8386 - 8397
  • [9] Iterative Learning of Speech Recognition Models for Air Traffic Control
    Srinivasamurthy, Ajay
    Motlicek, Petr
    Singh, Mittul
    Oualil, Youssef
    Kleinert, Matthias
    Ehr, Heiko
    Helmke, Hartmut
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3519 - 3523
  • [10] AUTOMATIC SPEECH SEMANTIC RECOGNITION AND VERIFICATION IN AIR TRAFFIC CONTROL
    Johnson, Daniel R.
    Nenov, Val I.
    Espinoza, Gustavo
    2013 IEEE/AIAA 32ND DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2013,