A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems

被引:81
作者
Lin, Yi [1 ]
Guo, Dongyue [1 ]
Zhang, Jianwei [1 ]
Chen, Zhengmao [1 ]
Yang, Bo [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610000, Peoples R China
基金
美国国家科学基金会;
关键词
Hidden Markov models; Task analysis; Atmospheric modeling; Speech recognition; Vocabulary; Decoding; Real-time systems; Acoustic model (AM); air traffic control (ATC); machine translation pronunciation model (PM); multiscale CNN (MCNN); multilingual; robust speech recognition; DEEP NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2020.3015830
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.
引用
收藏
页码:3608 / 3620
页数:13
相关论文
共 50 条
  • [41] PSEUDO-LABELING FOR MASSIVELY MULTILINGUAL SPEECH RECOGNITION
    Lugosch, Loren
    Likhomanenko, Tatiana
    Synnaeve, Gabriel
    Collobert, Ronan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7687 - 7691
  • [42] IXHEALTH: A Multilingual Platform for Advanced Speech Recognition in Healthcare
    Jose Vivancos-Vicente, Pedro
    Salvador Castejon-Garrido, Juan
    Andres Paredes-Valverde, Mario
    del Pilar Salas-Zarate, Maria
    Valencia-Garcia, Rafael
    TECHNOLOGIES AND INNOVATION, 2016, 658 : 26 - 38
  • [43] A hybrid CTC plus Attention model based on end-to-end framework for multilingual speech recognition
    Liang, Sendong
    Yan, Wei Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 41295 - 41308
  • [44] Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence
    Leal, Isabel
    Gaur, Neeraj
    Haghani, Parisa
    Farris, Brian
    Moreno, Pedro J.
    Prasad, Manasa
    Ramabhadran, Bhuvana
    Zhu, Yun
    INTERSPEECH 2021, 2021, : 2556 - 2560
  • [45] DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation
    Chen, Yi-Chen
    Hsu, Jui-Yang
    Lee, Cheng-Kuang
    Lee, Hung-yi
    INTERSPEECH 2020, 2020, : 1803 - 1807
  • [46] A framework for secure speech recognition
    Smaragdis, Paris
    Shashanka, Madhusudana V. S.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 969 - +
  • [47] A Framework for Speech Recognition Benchmarking
    Dernoncourt, Franck
    Trung Bui
    Chang, Walter
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 169 - 170
  • [48] A framework for secure speech recognition
    Smaragdis, Paris
    Shashanka, Madhusudana
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1404 - 1413
  • [49] SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems
    Chen, Huili
    Darvish, Bita
    Koushanfar, Farinaz
    INTERSPEECH 2020, 2020, : 2312 - 2316
  • [50] MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
    Anwar, Mohamed
    Shi, Bowen
    Goswami, Vedanuj
    Hsu, Wei-Ning
    Pino, Juan
    Wang, Changhan
    INTERSPEECH 2023, 2023, : 4064 - 4068