LATTICE-FREE MMI ADAPTATION OF SELF-SUPERVISED PRETRAINED ACOUSTIC MODELS

被引:10
作者
Vyas, ApoorV [1 ,2 ]
Madikeri, Srikanth [1 ]
Bourlard, Herve [1 ,2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
基金
瑞士国家科学基金会;
关键词
self-supervised pretraining; lfmmi; cross-lingual adaptation; automatic speech recognition;
D O I
10.1109/ICASSP39728.2021.9414741
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI on three different datasets. Our results show that fine-tuning with LFMMI, we consistently obtain relative WER improvements of 10% and 35.3% on the clean and other test sets of Librispeech (100h), 10.8% on Switchboard (300h), and 4.3% on Swahili (38h) and 4.4% on Tagalog (84h) compared to the baseline trained only with supervised data.
引用
收藏
页码:6219 / 6223
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2020, P NEURIPS
[2]  
Baevski A, 2020, ADV NEUR IN, V33
[3]   An Unsupervised Autoregressive Model for Speech Representation Learning [J].
Chung, Yu-An ;
Hsu, Wei-Ning ;
Tang, Hao ;
Glass, James .
INTERSPEECH 2019, 2019, :146-150
[4]  
Gales M. J. F., 2014, Spoken Language Technologies for Under-Resourced Languages, P16
[5]  
Godfrey J. J., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P517, DOI 10.1109/ICASSP.1992.225858
[6]  
Graves A., 2006, P 23 INT C MACH LEAR, P369
[7]  
Grézl F, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P470, DOI 10.1109/ASRU.2013.6707775
[8]   Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR [J].
Hadian, Hossein ;
Sameti, Hossein ;
Povey, Daniel ;
Khudanpur, Sanjeev .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) :1949-1961
[9]  
Inaguma H, 2019, INT CONF ACOUST SPEE, P6096, DOI [10.1109/ICASSP.2019.8682918, 10.1109/icassp.2019.8682918]
[10]  
Kingma DP, 2014, ADV NEUR IN, V27