SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

被引：10

作者：

Tsiamas, Ioannis ^{[1
]}

Gallego, Gerard I. ^{[1
]}

Fonollosa, Jose A. R. ^{[1
]}

Costa-jussa, Marta R. ^{[1
]}

机构：

[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech translation; audio segmentation;

D O I：

10.21437/Interspeech.2022-59

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech translation models are unable to directly process long audios, like TED talks, which have to be split into shorter segments. Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios, and existing segmentation methods usually significantly reduce translation quality at inference time. To bridge the gap between the manual segmentation of training and the automatic one at inference, we propose Supervised Hybrid Audio Segmentation (SHAS), a method that can effectively learn the optimal segmentation from any manually segmented speech corpus. First, we train a classifier to identify the included frames in a segmentation, using speech representations from a pre-trained wav2vec 2.0. The optimal splitting points are then found by a probabilistic Divide-and-Conquer algorithm that progressively splits at the frame of lowest probability until all segments are below a pre-specified length. Experiments on MuST-C and m-TEDx show that the translation of the segments produced by our method approaches the quality of the manual segmentation on 5 languages pairs. Namely, SHAS retains 95-98% of the manual segmentation's BLEU score, compared to the 87-93% of the best existing methods. Our method is additionally generalizable to different domains and achieves high zero-shot performance in unseen languages.

引用

页码：106 / 110

页数：5

共 50 条

[41] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
Liu, Da-Rong
Yang, Chi-Yu
Wu, Szu-Lin
Lee, Hung-Yi
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
[42] CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation
Lei, Yikun
Xue, Zhengshan
Sun, Haoran
Zhao, Xiaohu
Zhu, Shaolin
Lin, Xiaodong
Xiong, Deyi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3123 - 3137
[43] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
Wang, Changhan
Pino, Juan
Gu, Jiatao
INTERSPEECH 2020, 2020, : 4731 - 4735
[44] CCSRD: Content-Centric Speech Representation Disentanglement Learning for End-to-End Speech Translation
Zhao, Xiaohu
Sun, Haoran
Lei, Yikun
Zhu, Shaolin
Xiong, Deyi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5920 - 5932
[45] ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Le, Chenyang
Qian, Yao
Zhou, Long
Liu, Shujie
Qian, Yanmin
Zeng, Michael
Huang, Xuedong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[46] The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Xu, Chen
Liu, Xiaoqian
Liu, Xiaowen
Wang, Laohu
Huang, Canan
Xiao, Tong
Zhu, Jingbo
IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 92 - 99
[47] Transformer-Based End-to-End Speech Translation With Rotary Position Embedding
Li, Xueqing
Li, Shengqiang
Zhang, Xiao-Lei
Rahardja, Susanto
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 371 - 375
[48] Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
Xue, Jian
Wang, Peidong
Li, Jinyu
Post, Matt
Gaur, Yashesh
INTERSPEECH 2022, 2022, : 3263 - 3267
[49] Beyond Sentence-Level End-to-End Speech Translation: Context Helps
Zhang, Biao
Titov, Ivan
Haddow, Barry
Sennrich, Rico
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2566 - 2578
[50] Optimally Encoding Inductive Biases into the Transformer Improves End-to-End Speech Translation
Vyas, Piyush
Kuznetsova, Anastasia
Williamson, Donald S.
INTERSPEECH 2021, 2021, : 2287 - 2291

← 1 2 3 4 5 →