LATTICE-FREE MMI ADAPTATION OF SELF-SUPERVISED PRETRAINED ACOUSTIC MODELS

被引：10

作者：

Vyas, ApoorV ^{[1
,2
]}

Madikeri, Srikanth ^{[1
]}

Bourlard, Herve ^{[1
,2
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

基金：

瑞士国家科学基金会;

关键词：

self-supervised pretraining; lfmmi; cross-lingual adaptation; automatic speech recognition;

D O I：

10.1109/ICASSP39728.2021.9414741

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI on three different datasets. Our results show that fine-tuning with LFMMI, we consistently obtain relative WER improvements of 10% and 35.3% on the clean and other test sets of Librispeech (100h), 10.8% on Switchboard (300h), and 4.3% on Swahili (38h) and 4.4% on Tagalog (84h) compared to the baseline trained only with supervised data.

引用

页码：6219 / 6223

页数：5

共 26 条

[1]

[Anonymous], 2020, P NEURIPS

[2]

Baevski A, 2020, ADV NEUR IN, V33

[3] An Unsupervised Autoregressive Model for Speech Representation Learning [J].

Chung, Yu-An ;

Hsu, Wei-Ning ;

Tang, Hao ;

Glass, James .

INTERSPEECH 2019, 2019, :146-150

[4]

Gales M. J. F., 2014, Spoken Language Technologies for Under-Resourced Languages, P16

[5]

Godfrey J. J., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P517, DOI 10.1109/ICASSP.1992.225858

[6]

Graves A., 2006, P 23 INT C MACH LEAR, P369

[7]

Grézl F, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P470, DOI 10.1109/ASRU.2013.6707775

[8] Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR [J].

Hadian, Hossein ;

Sameti, Hossein ;

Povey, Daniel ;

Khudanpur, Sanjeev .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) :1949-1961

[9]

Inaguma H, 2019, INT CONF ACOUST SPEE, P6096, DOI [10.1109/ICASSP.2019.8682918, 10.1109/icassp.2019.8682918]

[10]

Kingma DP, 2014, ADV NEUR IN, V27

← 1 2 3 →