COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION

被引:1
作者
Li, Chak-Fai [1 ]
Keith, Francis [1 ]
Hartmann, William [1 ]
Snover, Matthew [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
seq2seq; unsupervised learning; semi-supervised training; domain adaptation; REPRESENTATION;
D O I
10.1109/ICASSP43922.2022.9747005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent advances in unsupervised representation learning have demonstrated the impact of pretraining on large amounts of read speech. We adapt these techniques for domain adaptation in low-resource-both in terms of data and compute-conversational and broadcast domains. Moving beyond CTC, we pretrain state-of-the-art Conformer models in an unsupervised manner. While the unsupervised approach outperforms traditional semi-supervised training, the techniques are complementary. Combining the techniques is a 5% absolute improvement in WER, averaged over all conditions, compared to semi-supervised training alone. Additional text data is incorporated through external language models. By using CTC-based decoding, we are better able to take advantage of the additional text data. When used as a transcription model, it allows the Conformer model to better incorporate the knowledge from the language model through semi-supervised training than shallow fusion. Final performance is an additional 2% better absolute when using CTC-based decoding for semi-supervised training compared to shallow fusion.
引用
收藏
页码:6892 / 6896
页数:5
相关论文
共 44 条
  • [41] A Semi-Supervised Transfer Learning with Dynamic Associate Domain Adaptation for Human Activity Recognition Using WiFi Signals
    Chen, Yuh-Shyan
    Chang, Yu-Chi
    Li, Chun-Yu
    SENSORS, 2021, 21 (24)
  • [42] Unsupervised Cross-Modality Domain Adaptation for Vestibular Schwannoma Segmentation and Koos Grade Prediction Based on Semi-supervised Contrastive Learning
    Han, Luyi
    Huang, Yunzhi
    Tan, Tao
    Mann, Ritse
    BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES, BRAINLES 2022, PT II, 2023, 14092 : 49 - 58
  • [43] Pseudo Label Association and Prototype-Based Invariant Learning for Semi-Supervised NIR-VIS Face Recognition
    Hu, Weipeng
    Yang, Yiming
    Hu, Haifeng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1448 - 1463
  • [44] IFF-WAV2VEC: Noise Robust Low-Resource Speech Recognition Based on Self-supervised Learning and Interactive Feature Fusion
    Cao, Jing
    Qian, Zhaopeng
    Yu, Chongchong
    Xie, Tao
    PROCEEDINGS OF 2023 6TH ARTIFICIAL INTELLIGENCE AND CLOUD COMPUTING CONFERENCE, AICCC 2023, 2023, : 232 - 237