A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems

被引:81
作者
Lin, Yi [1 ]
Guo, Dongyue [1 ]
Zhang, Jianwei [1 ]
Chen, Zhengmao [1 ]
Yang, Bo [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610000, Peoples R China
基金
美国国家科学基金会;
关键词
Hidden Markov models; Task analysis; Atmospheric modeling; Speech recognition; Vocabulary; Decoding; Real-time systems; Acoustic model (AM); air traffic control (ATC); machine translation pronunciation model (PM); multiscale CNN (MCNN); multilingual; robust speech recognition; DEEP NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2020.3015830
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.
引用
收藏
页码:3608 / 3620
页数:13
相关论文
共 50 条
  • [1] ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems
    Lin, Yi
    Yang, Bo
    Li, Linchao
    Guo, Dongyue
    Zhang, Jianwei
    Chen, Hu
    Zhang, Yi
    APPLIED SOFT COMPUTING, 2021, 112
  • [2] Enhancing multilingual speech recognition in air traffic control by sentence-level language identification
    Fan, Peng
    Guo, Dongyue
    Zhang, Jianwei
    Yang, Bo
    Lin, Yi
    APPLIED ACOUSTICS, 2024, 224
  • [3] Collaborative Multilingual Continuous Sign Language Recognition: A Unified Framework
    Hu, Hezhen
    Pu, Junfu
    Zhou, Wengang
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7559 - 7570
  • [4] ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning
    Yu, Xincheng
    Guo, Dongyue
    Zhang, Jianwei
    Lin, Yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3365 - 3378
  • [5] Iterative Learning of Speech Recognition Models for Air Traffic Control
    Srinivasamurthy, Ajay
    Motlicek, Petr
    Singh, Mittul
    Oualil, Youssef
    Kleinert, Matthias
    Ehr, Heiko
    Helmke, Hartmut
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3519 - 3523
  • [6] Towards Recognition for Radio-Echo Speech in Air Traffic Control: Dataset and a Contrastive Learning Approach
    Lin, Yi
    Wang, Qingyang
    Yu, Xincheng
    Zhang, Zichen
    Guo, Dongyue
    Zhou, Jizhe
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3249 - 3262
  • [7] Automatic Speech Recognition Benchmark for Air-Traffic Communications
    Zuluaga-Gomez, Juan
    Motlicek, Petr
    Zhan, Qingran
    Vesely, Karel
    Braun, Rudolf
    INTERSPEECH 2020, 2020, : 2297 - 2301
  • [8] MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
    Gaur, Neeraj
    Chen, Tongzhou
    Variani, Ehsan
    Haghani, Parisa
    Ramabhadran, Bhuvana
    Moreno, Pedro J.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6407 - 6411
  • [9] A Unified Framework for Multilingual Text-to-Speech Synthesis with SSML Specification as Interface
    吴志勇
    曹光琦
    蒙美玲
    蔡莲红
    TsinghuaScienceandTechnology, 2009, 14 (05) : 623 - 630
  • [10] ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment
    Yang, Bo
    Tan, Xianlong
    Chen, Zhengmao
    Wang, Bing
    Ruan, Min
    Li, Dan
    Yang, Zhongping
    Wu, Xiping
    Lin, Yi
    INTERSPEECH 2020, 2020, : 399 - 403