Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

被引：7

作者：

Wang, Changhan ^{[1
]}

Pino, Juan ^{[1
]}

Gu, Jiatao ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

end-to-end speech recognition; cross-lingual transfer learning; speech translation; machine translation;

D O I：

10.21437/Interspeech.2020-2955

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Transfer learning from high-resource languages is known to be an efficient way to improve end-to-end automatic speech recognition (ASR) for low-resource languages. Pre-trained or jointly trained encoder-decoder models, however, do not share the language modeling (decoder) for the same language, which is likely to be inefficient for distant target languages. We introduce speech-to-text translation (ST) as an auxiliary task to incorporate additional knowledge of the target language and enable transferring from that target language. Specifically, we first translate high-resource ASR transcripts into a target low resource language, with which a ST model is trained. Both ST and target ASR share the same attention-based encoder-decoder architecture and vocabulary. The former task then provides a fully pre-trained model for the latter, bringing up to 24.6% word error rate (WER) reduction to the baseline (direct transfer from high-resource ASR). We show that training ST with human translations is not necessary. ST trained with machine translation (MT) pseudo-labels brings consistent gains. It can even outperform those using human labels when transferred to target ASR by leveraging only 500K MT examples. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.

引用

页码：4731 / 4735

页数：5

共 41 条

[11]

Cho J, 2018, IEEE W SP LANG TECH, P521, DOI 10.1109/SLT.2018.8639655

[12]

Dalmia S, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4909, DOI 10.1109/ICASSP.2018.8461802

[13]

Di Gangi M. A., 2019, ARXIV191003320

[14]

Di Gangi MA, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P2012

[15]

Duong Long, 2016, P C N AM CHAPT ASS C, P949

[16] The connection between holographic entanglement and complexity of purification [J].

Ghodrati, Mahdis ;

Kuang, Xiao-Mei ;

Wang, Bin ;

Zhang, Cheng-Yong ;

Zhou, Yu-Ting .

JOURNAL OF HIGH ENERGY PHYSICS, 2019, 2019 (09)

[17]

Graves A., 2006, PROC ICML, P369, DOI DOI 10.1145/1143844.1143891

[18]

Hayashi T, 2018, IEEE W SP LANG TECH, P426, DOI 10.1109/SLT.2018.8639619

[19]

Hsu J.-Y., 2019, ARXIV191012094

[20]

Kim Yoon, 2016, P 2016 C EMPIRICAL M, P1317

← 1 2 3 4 5 →