A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems

被引：81

作者：

Lin, Yi ^{[1
]}

Guo, Dongyue ^{[1
]}

Zhang, Jianwei ^{[1
]}

Chen, Zhengmao ^{[1
]}

Yang, Bo ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610000, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 08期

基金：

美国国家科学基金会;

关键词：

Hidden Markov models; Task analysis; Atmospheric modeling; Speech recognition; Vocabulary; Decoding; Real-time systems; Acoustic model (AM); air traffic control (ATC); machine translation pronunciation model (PM); multiscale CNN (MCNN); multilingual; robust speech recognition; DEEP NEURAL-NETWORKS;

D O I：

10.1109/TNNLS.2020.3015830

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.

引用

页码：3608 / 3620

页数：13

共 50 条

[21] Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold
Sahraeian, Reza
Van Compernolle, Dirk
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2301 - 2312
[22] Semi-supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control
Srinivasamurthy, Ajay
Motlicek, Petr
Himawan, Ivan
Szaszak, Gyoergy
Oualil, Youssef
Helmke, Hartmut
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2406 - 2410
[23] A Deep Learning Framework of Autonomous Pilot Agent for Air Traffic Controller Training
Lin, Yi
Wu, YuanKai
Guo, Dongyue
Zhang, Pan
Yin, Changyu
Yang, Bo
Zhang, Jianwei
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2021, 51 (05) : 442 - 450
[24] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
Yi, Jiangyan
Tao, Jianhua
Wen, Zhengqi
Bai, Ye
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
[25] Multilingual Speech Recognition with Self-Attention Structured Parameterization
Zhu, Yun
Haghani, Parisa
Tripathi, Anshuman
Ramabhadran, Bhuvana
Farris, Brian
Xu, Hainan
Lu, Han
Sak, Hasim
Leal, Isabel
Gaur, Neeraj
Moreno, Pedro J.
Zhang, Qian
INTERSPEECH 2020, 2020, : 4741 - 4745
[26] MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
Toshniwal, Shubham
Sainath, Tara N.
Weiss, Ron J.
Li, Bo
Moreno, Pedro
Weinstein, Eugene
Rao, Kanishka
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4904 - 4908
[27] Multilingual speech recognition initiative for African languages
Mohamed, Naira Abdou
Allak, Anass
Gaanoun, Kamel
Benelallam, Imade
Erraji, Zakarya
Bahafid, Abdessalam
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
[28] MIXTURE OF INFORMED EXPERTS FOR MULTILINGUAL SPEECH RECOGNITION
Gaur, Neeraj
Farris, Brian
Haghani, Parisa
Leal, Isabel
Moreno, Pedro J.
Prasad, Manasa
Ramabhadran, Bhuvana
Zhu, Yun
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6234 - 6238
[29] Towards multilingual interoperability in automatic speech recognition
Adda-Decker, M
SPEECH COMMUNICATION, 2001, 35 (1-2) : 5 - 20
[30] Language Adaptive Multilingual CTC Speech Recognition
Mueller, Markus
Stueker, Sebastian
Waibel, Alex
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 473 - 482

← 1 2 3 4 5 →