Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition

被引:12
作者
Smidl, Lubos [1 ]
Svec, Jan [2 ]
Prazak, Ales [1 ]
Trmal, Jan [3 ]
机构
[1] Univ West Bohemia, Dept Cybernet, Plzen, Czech Republic
[2] SpeechTech Sro, Plzen, Czech Republic
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
来源
SPEECH AND COMPUTER (SPECOM 2018) | 2018年 / 11096卷
关键词
Semi-supervised training; Data selection; Acoustic modelling; ATC speech recognition;
D O I
10.1007/978-3-319-99579-3_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe a semi-supervised training method used to generalize the Air Traffic Control (ATC) speech recognizer. The paper introduces the problems and challenges in ATC English recognition, describes available datasets and ongoing research projects. The baseline recognition model is then used to recognize the unlabelled data from a publicly available source. We used the LiveATC community portal which records and archives the recordings of ATC communication near the airports. The recognized unlabelled data are filtered using the data selection procedure based on confidence scores and the recognition acoustic model is retrained to obtain a more general model. The results on accented Czech and French data are reported.
引用
收藏
页码:646 / 655
页数:10
相关论文
共 50 条
  • [21] Semi-supervised Part-of-speech Tagging in Speech Applications
    Dufour, Richard
    Favre, Benoit
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1373 - 1376
  • [22] On the Learning Dynamics of Semi-Supervised Training for ASR
    Wallington, Electra
    Kershenbaum, Benji
    Klejch, Ondrej
    Bell, Peter
    INTERSPEECH 2021, 2021, : 716 - 720
  • [23] SEMI-SUPERVISED TRAINING OF DEEP NEURAL NETWORKS
    Vesely, Karel
    Hannemann, Mirko
    Burget, Lukas
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 267 - 272
  • [24] Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models
    Drugman, Thomas
    Pylkkonen, Janne
    Kneser, Reinhard
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2318 - 2322
  • [25] COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION
    Li, Chak-Fai
    Keith, Francis
    Hartmann, William
    Snover, Matthew
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6892 - 6896
  • [26] KAIZEN: CONTINUOUSLY IMPROVING TEACHER USING EXPONENTIAL MOVING AVERAGE FOR SEMI-SUPERVISED SPEECH RECOGNITION
    Manohar, Vimal
    Likhomanenko, Tatiana
    Xu, Qiantong
    Hsu, Wei-Ning
    Collobert, Ronan
    Saraf, Yatharth
    Zweig, Geoffrey
    Mohamed, Abdelrahman
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 518 - 525
  • [27] End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
    Wu, Pengfei
    Ling, Zhenhua
    Liu, Lijuan
    Jiang, Yuan
    Wu, Hongchuan
    Dai, Lirong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 623 - 627
  • [28] Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification
    Zeng, Jinxiang
    Zhang, Du
    Li, Zhiyi
    Li, Xiaolin
    APPLIED SCIENCES-BASEL, 2021, 11 (12):
  • [29] Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
    Guo, Pengcheng
    Xu, Haihua
    Xie, Lei
    Chng, Eng Siong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1928 - 1932
  • [30] Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training
    Wu, Zhizheng
    King, Simon
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1255 - 1265