Lithuanian Broadcast Speech Transcription using Semi-supervised Acoustic Model Training

被引:13
作者
Lileikyte, Rasa [1 ]
Gorin, Arseniy [1 ]
Lamel, Lori [1 ]
Gauvain, Jean-Luc [1 ]
Fraga-Silva, Thiago [2 ]
机构
[1] Univ Paris Saclay, CNRS, LIMSI, 508 Campus Univ, F-91405 Orsay, France
[2] Vocapia Res, 28 Rue Jean Rostand, F-91400 Orsay, France
来源
SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES | 2016年 / 81卷
关键词
Automatic speech recognition; Low-resourced languages; Semi-supervised training; Neural networks; Lithuanian language; RECOGNITION;
D O I
10.1016/j.procs.2016.04.037
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper reports on an experimental work to build a speech transcription system for Lithuanian broadcast data, relying on unsupervised and semi-supervised training methods as well as on other low-knowledge methods to compensate for missing resources. Unsupervised acoustic model training is investigated using 360 hours of untranscribed speech data. A graphemic pronunciation approach is used to simplify the pronunciation model generation and therefore ease the language model adaptation for the system users. Discriminative training on top of semi-supervised training is also investigated, as well as various types of acoustic features and their combinations. Experimental results are provided for each of our development steps as well as contrastive results comparing various options. Using the best system configuration a word error rate of 18.3% is obtained on a set of development data from the Quaero program. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:107 / 113
页数:7
相关论文
共 50 条
  • [41] KAIZEN: CONTINUOUSLY IMPROVING TEACHER USING EXPONENTIAL MOVING AVERAGE FOR SEMI-SUPERVISED SPEECH RECOGNITION
    Manohar, Vimal
    Likhomanenko, Tatiana
    Xu, Qiantong
    Hsu, Wei-Ning
    Collobert, Ronan
    Saraf, Yatharth
    Zweig, Geoffrey
    Mohamed, Abdelrahman
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 518 - 525
  • [42] DEEPEMOCLUSTER: A SEMI-SUPERVISED FRAMEWORK FOR LATENT CLUSTER REPRESENTATION OF SPEECH EMOTIONS
    Lin, Wei-Cheng
    Sridhar, Kusha
    Busso, Carlos
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7263 - 7267
  • [43] Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech
    Li, Li
    Kameoka, Hirokazu
    Higuchi, Takuya
    Saruwatari, Hiroshi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3753 - +
  • [44] Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
    Zhu H.
    Gao D.
    Cheng G.
    Povey D.
    Zhang P.
    Yan Y.
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3320 - 3330
  • [45] Semi-supervised speech activity detection with an application to automatic speaker verification
    Sholokhov, Alexey
    Sahidullah, Md
    Kinnunen, Tomi
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 132 - 156
  • [46] Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models
    Drugman, Thomas
    Pylkkonen, Janne
    Kneser, Reinhard
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2318 - 2322
  • [47] Multiview Semi-Supervised Learning Model for Image Classification
    Nie, Feiping
    Tian, Lai
    Wang, Rong
    Li, Xuelong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (12) : 2389 - 2400
  • [48] Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS
    Deng, Yan
    Zhao, Rui
    Meng, Zhong
    Chen, Xie
    Liu, Bing
    Li, Jinyu
    Gong, Yifan
    He, Lei
    INTERSPEECH 2021, 2021, : 751 - 755
  • [49] Semi-supervised Time Series Classification Model with Self-supervised Learning
    Xi, Liang
    Yun, Zichao
    Liu, Han
    Wang, Ruidong
    Huang, Xunhua
    Fan, Haoyi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
  • [50] Multi-softmax Deep Neural Network for Semi-supervised Training
    Su, Hang
    Xu, Haihua
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3239 - 3243