Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition

被引:12
作者
Smidl, Lubos [1 ]
Svec, Jan [2 ]
Prazak, Ales [1 ]
Trmal, Jan [3 ]
机构
[1] Univ West Bohemia, Dept Cybernet, Plzen, Czech Republic
[2] SpeechTech Sro, Plzen, Czech Republic
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
来源
SPEECH AND COMPUTER (SPECOM 2018) | 2018年 / 11096卷
关键词
Semi-supervised training; Data selection; Acoustic modelling; ATC speech recognition;
D O I
10.1007/978-3-319-99579-3_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe a semi-supervised training method used to generalize the Air Traffic Control (ATC) speech recognizer. The paper introduces the problems and challenges in ATC English recognition, describes available datasets and ongoing research projects. The baseline recognition model is then used to recognize the unlabelled data from a publicly available source. We used the LiveATC community portal which records and archives the recordings of ATC communication near the airports. The recognized unlabelled data are filtered using the data selection procedure based on confidence scores and the recognition acoustic model is retrained to obtain a more general model. The results on accented Czech and French data are reported.
引用
收藏
页码:646 / 655
页数:10
相关论文
共 50 条
[21]   Semi-supervised Part-of-speech Tagging in Speech Applications [J].
Dufour, Richard ;
Favre, Benoit .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, :1373-1376
[22]   On the Learning Dynamics of Semi-Supervised Training for ASR [J].
Wallington, Electra ;
Kershenbaum, Benji ;
Klejch, Ondrej ;
Bell, Peter .
INTERSPEECH 2021, 2021, :716-720
[23]   SEMI-SUPERVISED TRAINING OF DEEP NEURAL NETWORKS [J].
Vesely, Karel ;
Hannemann, Mirko ;
Burget, Lukas .
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, :267-272
[24]   Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models [J].
Drugman, Thomas ;
Pylkkonen, Janne ;
Kneser, Reinhard .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2318-2322
[25]   COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION [J].
Li, Chak-Fai ;
Keith, Francis ;
Hartmann, William ;
Snover, Matthew .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6892-6896
[26]   KAIZEN: CONTINUOUSLY IMPROVING TEACHER USING EXPONENTIAL MOVING AVERAGE FOR SEMI-SUPERVISED SPEECH RECOGNITION [J].
Manohar, Vimal ;
Likhomanenko, Tatiana ;
Xu, Qiantong ;
Hsu, Wei-Ning ;
Collobert, Ronan ;
Saraf, Yatharth ;
Zweig, Geoffrey ;
Mohamed, Abdelrahman .
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :518-525
[27]   End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training [J].
Wu, Pengfei ;
Ling, Zhenhua ;
Liu, Lijuan ;
Jiang, Yuan ;
Wu, Hongchuan ;
Dai, Lirong .
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, :623-627
[28]   Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification [J].
Zeng, Jinxiang ;
Zhang, Du ;
Li, Zhiyi ;
Li, Xiaolin .
APPLIED SCIENCES-BASEL, 2021, 11 (12)
[29]   Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition [J].
Guo, Pengcheng ;
Xu, Haihua ;
Xie, Lei ;
Chng, Eng Siong .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1928-1932
[30]   Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training [J].
Wu, Zhizheng ;
King, Simon .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) :1255-1265