Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

被引：0

作者：

Inaguma, Hirofumi ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

Watanabe, Shinji ^{[2
]}

机构：

[1] Kyoto Univ, Kyoto, Japan

[2] Johns Hopkins Univ, Baltimore, MD 21218 USA

来源：

2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021) | 2021年

关键词：

MODELS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A conventional approach to improving the performance of end-to-end speech translation (E2E-ST) models is to leverage the source transcription via pre-training and joint training with automatic speech recognition (ASR) and neural machine translation (NMT) tasks. However, since the input modalities are different, it is difficult to leverage source language text successfully. In this work, we focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models. To leverage the full potential of the source language information, we propose backward SeqKD, SeqKD from a target-to-source backward NMT model. To this end, we train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder. The paraphrases are generated from the translations in bitext via back-translation. We further propose bidirectional SeqKD in which SeqKD from both forward and backward NMT models is combined. Experimental evaluations on both autoregressive and non-autoregressive models show that SeqKD in each direction consistently improves the translation performance, and the effectiveness is complementary regardless of the model capacity.

引用

页码：1872 / 1881

页数：10

共 50 条

[1] End-to-End Speech Translation with Knowledge Distillation
Liu, Yuchen
Xiong, Hao
Zhang, Jiajun
He, Zhongjun
Wu, Hua
Wang, Haifeng
Zong, Chengqing
INTERSPEECH 2019, 2019, : 1128 - 1132
[2] Knowledge Distillation on Joint Task End-to-End Speech Translation
Nayem, Khandokar Md
Xue, Ran
Chang, Ching-Yun
Shanbhogue, Akshaya Vishnu Kudlu
INTERSPEECH 2023, 2023, : 1493 - 1497
[3] End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
Gaido, Marco
Di Gangi, Mattia Antonino
Negri, Matteo
Turchi, Marco
17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 80 - 88
[4] TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition
Yoon, Ji Won
Lee, Hyeonseung
Kim, Hyung Yong
Cho, Won Ik
Kim, Nam Soo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1626 - 1638
[5] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription
Lin, Yuqin
Wang, Longbiao
Li, Sheng
Dang, Jianwu
Ding, Chenchen
INTERSPEECH 2020, 2020, : 4791 - 4795
[6] MULTILINGUAL END-TO-END SPEECH TRANSLATION
Inaguma, Hirofumi
Duh, Kevin
Kawahara, Tatsuya
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
[7] End-to-end spoofing speech detection and knowledge distillation under noisy conditions
Liu, Pengfei
Zhang, Zhenchuan
Yang, Yingchun
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[8] End-to-End Speech Translation for Code Switched Speech
Weller, Orion
Sperber, Matthias
Pires, Telmo
Setiawan, Hendra
Gollan, Christian
Telaar, Dominic
Paulik, Matthias
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
[9] End-to-End Speech Translation with Adversarial Training
Li, Xuancai
Chen, Kehai
Zhao, Tiejun
Yang, Muyun
WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
[10] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
Berard, Alexandre
Besacier, Laurent
Kocabiyikoglu, Ali Can
Pietquin, Olivier
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228

← 1 2 3 4 5 →