Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition

被引:12
作者
Smidl, Lubos [1 ]
Svec, Jan [2 ]
Prazak, Ales [1 ]
Trmal, Jan [3 ]
机构
[1] Univ West Bohemia, Dept Cybernet, Plzen, Czech Republic
[2] SpeechTech Sro, Plzen, Czech Republic
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
来源
SPEECH AND COMPUTER (SPECOM 2018) | 2018年 / 11096卷
关键词
Semi-supervised training; Data selection; Acoustic modelling; ATC speech recognition;
D O I
10.1007/978-3-319-99579-3_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe a semi-supervised training method used to generalize the Air Traffic Control (ATC) speech recognizer. The paper introduces the problems and challenges in ATC English recognition, describes available datasets and ongoing research projects. The baseline recognition model is then used to recognize the unlabelled data from a publicly available source. We used the LiveATC community portal which records and archives the recordings of ATC communication near the airports. The recognized unlabelled data are filtered using the data selection procedure based on confidence scores and the recognition acoustic model is retrained to obtain a more general model. The results on accented Czech and French data are reported.
引用
收藏
页码:646 / 655
页数:10
相关论文
共 50 条
[31]   Semi-supervised and Cross-lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models under Low-resource Conditions [J].
Xu, Haihua ;
Su, Hang ;
Ni, Chongjia ;
Xiao, Xiong ;
Huang, Hao ;
Chng, Eng-Siong ;
Li, Haizhou .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :1315-1319
[32]   SEMI-SUPERVISED TRAINING IN LOW-RESOURCE ASR AND KWS [J].
Metze, Florian ;
Gandhe, Ankur ;
Miao, Yajie ;
Sheikh, Zaid ;
Wang, Yun ;
Xu, Di ;
Zhang, Hao ;
Kim, Jungsuk ;
Lane, Ian ;
Lee, Won Kyum ;
Stueker, Sebastian ;
Mueller, Markus .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4699-4703
[33]   Data requirements, selection and augmentation for DNN-based speech synthesis from crowdsourced data [J].
Toman, Markus ;
Meltzner, Geoffrey S. ;
Patel, Rupal .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :2878-2882
[34]   IMPROVING SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION USING CYCLEGAN AND INTER-DOMAIN LOSSES [J].
Li, Chia-Yu ;
Vu, Ngoc Thang .
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, :822-829
[35]   Multi-softmax Deep Neural Network for Semi-supervised Training [J].
Su, Hang ;
Xu, Haihua .
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, :3239-3243
[36]   Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages [J].
Grezl, Frantisek ;
Karafiat, Martin .
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, :820-824
[37]   SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NEURAL NETWORK FEATURE EXTRACTOR TRAINING [J].
Grezl, Frantisek ;
Karafiat, Martin .
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, :470-475
[38]   DISCRIMINATIVE SEMI-SUPERVISED TRAINING FOR KEYWORD SEARCH IN LOW RESOURCE LANGUAGES [J].
Hsiao, Roger ;
Ng, Tim ;
Grezl, Frantisek ;
Karakos, Damianos ;
Tsakalidis, Stavros ;
Long Nguyen ;
Schwartz, Richard .
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, :440-445
[39]   SEMI-SUPERVISED SELF-TRAINING MODEL FOR THE SEGMENTATION OF THE LEFT VENTRICLE OF THE HEART FROM ULTRASOUND DATA [J].
Carneiro, Gustavo ;
Nascimento, Jacinto ;
Freitas, Antonio .
2011 8TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, 2011, :1295-1301
[40]   Bilevel Joint Unsupervised and Supervised Training for Automatic Speech Recognition [J].
Cui, Xiaodong ;
Saif, A. F. M. ;
Lu, Songtao ;
Chen, Lisha ;
Chen, Tianyi ;
Kingsbury, Brian ;
Saon, George .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2025, 33 :286-296