Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition

被引:12
作者
Smidl, Lubos [1 ]
Svec, Jan [2 ]
Prazak, Ales [1 ]
Trmal, Jan [3 ]
机构
[1] Univ West Bohemia, Dept Cybernet, Plzen, Czech Republic
[2] SpeechTech Sro, Plzen, Czech Republic
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
来源
SPEECH AND COMPUTER (SPECOM 2018) | 2018年 / 11096卷
关键词
Semi-supervised training; Data selection; Acoustic modelling; ATC speech recognition;
D O I
10.1007/978-3-319-99579-3_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe a semi-supervised training method used to generalize the Air Traffic Control (ATC) speech recognizer. The paper introduces the problems and challenges in ATC English recognition, describes available datasets and ongoing research projects. The baseline recognition model is then used to recognize the unlabelled data from a publicly available source. We used the LiveATC community portal which records and archives the recordings of ATC communication near the airports. The recognized unlabelled data are filtered using the data selection procedure based on confidence scores and the recognition acoustic model is retrained to obtain a more general model. The results on accented Czech and French data are reported.
引用
收藏
页码:646 / 655
页数:10
相关论文
共 50 条
[41]   SEMI-SUPERVISED TRAINING FOR END-TO-END MODELS VIA WEAK DISTILLATION [J].
Li, Bo ;
Sainath, Tara N. ;
Pang, Ruoming ;
Wu, Zelin .
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, :2837-2841
[42]   FedEntropy: Information-entropy-aided training optimization of semi-supervised federated learning [J].
Qian, Dongwei ;
Cui, Yangguang ;
Fu, Yufei ;
Liu, Feng ;
Wei, Tongquan .
JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 137
[43]   Crosslingual acoustic model development for automatic speech recognition [J].
Diehl, Frank ;
Moreno, Asuncion ;
Monte, Enric .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :425-430
[44]   Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS [J].
Deng, Yan ;
Zhao, Rui ;
Meng, Zhong ;
Chen, Xie ;
Liu, Bing ;
Li, Jinyu ;
Gong, Yifan ;
He, Lei .
INTERSPEECH 2021, 2021, :751-755
[45]   Risk-Based Semi-Supervised Discriminative Language Modeling for Broadcast Transcription [J].
Kobayashi, Akio ;
Oku, Takahiro ;
Imai, Toru ;
Nakagawa, Seiichi .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (11) :2674-2681
[46]   Improving the Performance of Universal Traffic Light Recognition Through Dataset Construction and Selection in Semi-supervised Learning [J].
Kim, Dayoung ;
Li, Xingyou ;
Kim, Hakil .
Journal of Institute of Control, Robotics and Systems, 2024, 30 (08) :787-792
[47]   Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages [J].
Biswas, Astik ;
Yilmaz, Emre ;
de Wet, Febe ;
Van der Westhuizen, Ewald ;
Niesler, Thomas .
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, :3468-3474
[48]   Bi-Directional Semi-Supervised Training of Convolutional Neural Networks for Ultrasound Elastography Displacement Estimation [J].
Tehrani, Ali K. Z. ;
Sharifzadeh, Mostafa ;
Boctor, Emad ;
Rivaz, Hassan .
IEEE TRANSACTIONS ON ULTRASONICS FERROELECTRICS AND FREQUENCY CONTROL, 2022, 69 (04) :1181-1190
[49]   Analysing Acoustic Model Changes for Active Learning in Automatic Speech Recognition [J].
Wu, Chenhao ;
Ng, Raymond W. M. ;
Torralba, Oscar Saz ;
Hain, Thomas .
2017 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP), 2017,
[50]   DEALING WITH ACOUSTIC MISMATCH FOR TRAINING MULTILINGUAL SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION [J].
Mohan, Aanchan ;
Ghalehjegh, Sina Hamidi ;
Rose, Richard C. .
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, :4893-4896