SEMI-SUPERVISED TRAINING IN LOW-RESOURCE ASR AND KWS

被引:0
|
作者
Metze, Florian [1 ,2 ]
Gandhe, Ankur [1 ,2 ]
Miao, Yajie [1 ,2 ]
Sheikh, Zaid [1 ,2 ]
Wang, Yun [1 ,2 ]
Xu, Di [1 ,2 ]
Zhang, Hao [1 ,2 ]
Kim, Jungsuk [3 ,4 ]
Lane, Ian [3 ,4 ]
Lee, Won Kyum [3 ,4 ]
Stueker, Sebastian [5 ]
Mueller, Markus [5 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Moffett Field, CA USA
[3] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, Dept Elect & Comp Engn, Moffett Field, CA USA
[5] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
基金
美国国家科学基金会;
关键词
spoken term detection; automatic speech recognition; low-resource LTs; semi-supervised training; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In particular for "low resource" Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech quality in Assamese, Bengali, Lao, Haitian, Zulu, and Tamil, demonstrating the impact that such techniques can have, in particular learning robust bottle-neck features on the test data. In the case of Tamil, when significantly more test data than training data is available, we integrated semi-supervised training and speaker adaptation on the test data, and achieved significant additional improvements in STT and KWS.
引用
收藏
页码:4699 / 4703
页数:5
相关论文
共 50 条
  • [1] Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR
    Choudhary, Tripti
    Goyal, Vishal
    Bansal, Atul
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2025, 18 (01):
  • [2] A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition
    Du, Ye-Qian
    Zhang, Jie
    Fang, Xin
    Wu, Ming-Hui
    Yang, Zhou-Wang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3908 - 3921
  • [3] On the Learning Dynamics of Semi-Supervised Training for ASR
    Wallington, Electra
    Kershenbaum, Benji
    Klejch, Ondrej
    Bell, Peter
    INTERSPEECH 2021, 2021, : 716 - 720
  • [4] Semi-supervised DNN training with word selection for ASR
    Vesely, Karel
    Burget, Lukas
    Cernocky, Jan Honza
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3687 - 3691
  • [5] DISCRIMINATIVE SEMI-SUPERVISED TRAINING FOR KEYWORD SEARCH IN LOW RESOURCE LANGUAGES
    Hsiao, Roger
    Ng, Tim
    Grezl, Frantisek
    Karakos, Damianos
    Tsakalidis, Stavros
    Long Nguyen
    Schwartz, Richard
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 440 - 445
  • [6] DEEP NEURAL NETWORK FEATURES AND SEMI-SUPERVISED TRAINING FOR LOW RESOURCE SPEECH RECOGNITION
    Thomas, Samuel
    Seltzer, Michael L.
    Church, Kenneth
    Hermansky, Hynek
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6704 - 6708
  • [7] ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data
    Cheuk, Kin Wai
    Herremans, Dorien
    Su, Li
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3918 - 3926
  • [8] Data Augmentation, Feature Combination, and Multilingual Neural Networks to Improve ASR and KWS Performance for Low-resource Languages
    Tueske, Zoltan
    Golik, Pavel
    Nolden, David
    Schlueter, Ralf
    Ney, Hermann
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1420 - 1424
  • [9] Semi-supervised acoustic model training for five-lingual code-switched ASR
    Biswas, Astik
    Yilmaz, Emre
    de Wet, Febe
    van der Westhuizen, Ewald
    Niesler, Thomas
    INTERSPEECH 2019, 2019, : 3745 - 3749
  • [10] Semi-supervised acoustic model training for speech with code-switching
    Yilmaz, Emre
    McLaren, Mitchell
    van den Heuvel, Henk
    van Leeuwen, David A.
    SPEECH COMMUNICATION, 2018, 105 : 12 - 22