SEMI-SUPERVISED TRAINING IN LOW-RESOURCE ASR AND KWS

被引:0
|
作者
Metze, Florian [1 ,2 ]
Gandhe, Ankur [1 ,2 ]
Miao, Yajie [1 ,2 ]
Sheikh, Zaid [1 ,2 ]
Wang, Yun [1 ,2 ]
Xu, Di [1 ,2 ]
Zhang, Hao [1 ,2 ]
Kim, Jungsuk [3 ,4 ]
Lane, Ian [3 ,4 ]
Lee, Won Kyum [3 ,4 ]
Stueker, Sebastian [5 ]
Mueller, Markus [5 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Moffett Field, CA USA
[3] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, Dept Elect & Comp Engn, Moffett Field, CA USA
[5] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
基金
美国国家科学基金会;
关键词
spoken term detection; automatic speech recognition; low-resource LTs; semi-supervised training; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In particular for "low resource" Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech quality in Assamese, Bengali, Lao, Haitian, Zulu, and Tamil, demonstrating the impact that such techniques can have, in particular learning robust bottle-neck features on the test data. In the case of Tamil, when significantly more test data than training data is available, we integrated semi-supervised training and speaker adaptation on the test data, and achieved significant additional improvements in STT and KWS.
引用
收藏
页码:4699 / 4703
页数:5
相关论文
共 50 条
  • [21] Improving Semi-supervised Deep Neural Network. for Keyword Search in Low Resource Languages
    Hsiao, Roger
    Ng, Tim
    Le Zhang
    Ranjan, Shivesh
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1088 - 1091
  • [22] Exploiting Eigenposteriors for Semi-supervised Training of DNN Acoustic Models with Sequence Discrimination
    Dighe, Pranay
    Asaei, Afsaneh
    Bourlard, Nerve
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3552 - 3556
  • [23] SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI
    Manohar, Vimal
    Hadian, Hossein
    Povey, Daniel
    Khudanpur, Sanjeev
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4844 - 4848
  • [24] Two Semi-Supervised Training Approaches for Automated Text Recognition
    Leifert, Gundram
    Labahn, Roger
    Sanchez, Joan Andreu
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 145 - 150
  • [25] SEMI-SUPERVISED AND POPULATION BASED TRAINING FOR VOICE COMMANDS RECOGNITION
    Elibol, Oguz H.
    Keskin, Gokce
    Thomas, Anil
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6371 - 6375
  • [26] LANGUAGE DIARIZATION FOR SEMI-SUPERVISED BILINGUAL ACOUSTIC MODEL TRAINING
    Yilmaz, Emre
    McLaren, Mitchell
    van den Heuvel, Henk
    van Leeuwen, David A.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 91 - 96
  • [27] Combining Simple but Novel Data Augmentation Methods for Improving Low-Resource ASR
    Damania, Ronit
    Homan, Christopher
    Prud'hommeaux, Emily
    INTERSPEECH 2022, 2022, : 4890 - 4894
  • [28] Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels
    Higuchi, Yosuke
    Moritz, Niko
    Le Roux, Jonathan
    Hori, Takaaki
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1424 - 1438
  • [29] Multi-softmax Deep Neural Network for Semi-supervised Training
    Su, Hang
    Xu, Haihua
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3239 - 3243
  • [30] Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages
    Grezl, Frantisek
    Karafiat, Martin
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 820 - 824