COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION

被引:1
作者
Li, Chak-Fai [1 ]
Keith, Francis [1 ]
Hartmann, William [1 ]
Snover, Matthew [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
seq2seq; unsupervised learning; semi-supervised training; domain adaptation; REPRESENTATION;
D O I
10.1109/ICASSP43922.2022.9747005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent advances in unsupervised representation learning have demonstrated the impact of pretraining on large amounts of read speech. We adapt these techniques for domain adaptation in low-resource-both in terms of data and compute-conversational and broadcast domains. Moving beyond CTC, we pretrain state-of-the-art Conformer models in an unsupervised manner. While the unsupervised approach outperforms traditional semi-supervised training, the techniques are complementary. Combining the techniques is a 5% absolute improvement in WER, averaged over all conditions, compared to semi-supervised training alone. Additional text data is incorporated through external language models. By using CTC-based decoding, we are better able to take advantage of the additional text data. When used as a transcription model, it allows the Conformer model to better incorporate the knowledge from the language model through semi-supervised training than shallow fusion. Final performance is an additional 2% better absolute when using CTC-based decoding for semi-supervised training compared to shallow fusion.
引用
收藏
页码:6892 / 6896
页数:5
相关论文
共 44 条
  • [31] Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models
    Lam-Yee-Mui, Lea-Marie
    Yang, Lucas Ondel
    Klejch, Ondrej
    INTERSPEECH 2023, 2023, : 87 - 91
  • [32] Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration
    Aminzadeh, A. Ryan
    Shen, Wade
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 265 - 268
  • [33] IDMatchHAR: Semi-Supervised Learning for Sensor-Based Human Activity Recognition Using Pretraining
    Takenaka, Koki
    Sakai, Shunsuke
    Hasegawa, Tatsuhito
    IEEE SENSORS LETTERS, 2025, 9 (04)
  • [34] Fuzzy weighted sparse reconstruction error-steered semi-supervised learning for face recognition
    Liu, Li
    Chen, Siqi
    Chen, Xiuxiu
    Wang, Tianshi
    Zhang, Long
    VISUAL COMPUTER, 2020, 36 (08) : 1521 - 1534
  • [35] Cooperative supervised and unsupervised learning algorithm for phoneme recognition in continuous speech and speaker-independent context
    Arous, N
    Ellouze, N
    NEUROCOMPUTING, 2003, 51 : 225 - 235
  • [36] Semi-Supervised FMCW Radar Hand Gesture Recognition via Pseudo-Label Consistency Learning
    Shi, Yuhang
    Qiao, Lihong
    Shu, Yucheng
    Li, Baobin
    Xiao, Bin
    Li, Weisheng
    Gao, Xinbo
    REMOTE SENSING, 2024, 16 (13)
  • [37] Retrieval-based cartoon gesture recognition and applications via semi-supervised heterogeneous classifiers learning
    Liang, Zhang
    Zhuang, Yueting
    Yang, Yi
    Xiao, Jun
    PATTERN RECOGNITION, 2013, 46 (01) : 412 - 423
  • [38] Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages
    Zhang, Haitong
    Lin, Yue
    INTERSPEECH 2020, 2020, : 3161 - 3165
  • [39] IMPROVING SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION USING CYCLEGAN AND INTER-DOMAIN LOSSES
    Li, Chia-Yu
    Vu, Ngoc Thang
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 822 - 829
  • [40] Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription
    Su, Rongfeng
    Liu, Xunying
    Wang, Lan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3509 - 3513