A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems

被引:81
作者
Lin, Yi [1 ]
Guo, Dongyue [1 ]
Zhang, Jianwei [1 ]
Chen, Zhengmao [1 ]
Yang, Bo [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610000, Peoples R China
基金
美国国家科学基金会;
关键词
Hidden Markov models; Task analysis; Atmospheric modeling; Speech recognition; Vocabulary; Decoding; Real-time systems; Acoustic model (AM); air traffic control (ATC); machine translation pronunciation model (PM); multiscale CNN (MCNN); multilingual; robust speech recognition; DEEP NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2020.3015830
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.
引用
收藏
页码:3608 / 3620
页数:13
相关论文
共 50 条
  • [21] Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold
    Sahraeian, Reza
    Van Compernolle, Dirk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2301 - 2312
  • [22] Semi-supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control
    Srinivasamurthy, Ajay
    Motlicek, Petr
    Himawan, Ivan
    Szaszak, Gyoergy
    Oualil, Youssef
    Helmke, Hartmut
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2406 - 2410
  • [23] A Deep Learning Framework of Autonomous Pilot Agent for Air Traffic Controller Training
    Lin, Yi
    Wu, YuanKai
    Guo, Dongyue
    Zhang, Pan
    Yin, Changyu
    Yang, Bo
    Zhang, Jianwei
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2021, 51 (05) : 442 - 450
  • [24] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [25] Multilingual Speech Recognition with Self-Attention Structured Parameterization
    Zhu, Yun
    Haghani, Parisa
    Tripathi, Anshuman
    Ramabhadran, Bhuvana
    Farris, Brian
    Xu, Hainan
    Lu, Han
    Sak, Hasim
    Leal, Isabel
    Gaur, Neeraj
    Moreno, Pedro J.
    Zhang, Qian
    INTERSPEECH 2020, 2020, : 4741 - 4745
  • [26] MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
    Toshniwal, Shubham
    Sainath, Tara N.
    Weiss, Ron J.
    Li, Bo
    Moreno, Pedro
    Weinstein, Eugene
    Rao, Kanishka
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4904 - 4908
  • [27] Multilingual speech recognition initiative for African languages
    Mohamed, Naira Abdou
    Allak, Anass
    Gaanoun, Kamel
    Benelallam, Imade
    Erraji, Zakarya
    Bahafid, Abdessalam
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [28] MIXTURE OF INFORMED EXPERTS FOR MULTILINGUAL SPEECH RECOGNITION
    Gaur, Neeraj
    Farris, Brian
    Haghani, Parisa
    Leal, Isabel
    Moreno, Pedro J.
    Prasad, Manasa
    Ramabhadran, Bhuvana
    Zhu, Yun
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6234 - 6238
  • [29] Towards multilingual interoperability in automatic speech recognition
    Adda-Decker, M
    SPEECH COMMUNICATION, 2001, 35 (1-2) : 5 - 20
  • [30] Language Adaptive Multilingual CTC Speech Recognition
    Mueller, Markus
    Stueker, Sebastian
    Waibel, Alex
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 473 - 482