ASSESSING EVALUATION METRICS FOR SPEECH-TO-SPEECH TRANSLATION

被引：4

作者：

Salesky, Elizabeth ^{[1
]}

Maeder, Julian ^{[2
]}

Klinger, Severin ^{[2
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Swiss Fed Inst Technol, Zurich, Switzerland

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

evaluation; speech synthesis; speech translation; speech-to-speech; dialects;

D O I：

10.1109/ASRU51503.2021.9688073

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech-to-speech translation combines machine translation with speech synthesis, introducing evaluation challenges not present in either task alone. How to automatically evaluate speech-to-speech translation is an open question which has not previously been explored. Translating to speech rather than to text is often motivated by unwritten languages or languages without standardized orthographies. However, we show that the previously used automatic metric for this task is best equipped for standardized high-resource languages only. In this work, we first evaluate current metrics for speech-to-speech translation, and second assess how translation to dialectal variants rather than to standardized languages impacts various evaluation methods.

引用

页码：733 / 740

页数：8

共 50 条

[21] Deriving phonetic transcriptions and discovering word segmentations for speech-to-speech translation in low-resource settings
Wilkinson, Andrew
Zhao, Tiancheng
Black, Alan W.
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3086 - 3090
[22] Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
Hattori, Nobuhiko
Toda, Tomoki
Kawai, Hisashi
Saruwatari, Hiroshi
Shikano, Kiyohiro
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2780 - +
[23] Streaming Parrotron for on-device speech-to-speech conversion
Rybakov, Oleg
Biadsy, Fadi
Zhang, Xia
Jiang, Liyang
Meadowlark, Phoenix
Agrawal, Shivani
INTERSPEECH 2023, 2023, : 2033 - 2037
[24] RECENT ADVANCES IN SRI'S IRAQCOMM™ IRAQI ARABIC-ENGLISH SPEECH-TO-SPEECH TRANSLATION SYSTEM
Akbacak, Murat
Franco, Horacio
Frandsen, Michael
Hasan, Sasa
Jameel, Huda
Kathol, Andreas
Khadivi, Shahram
Lei, Xin
Mandal, Arindam
Mansour, Saab
Precoda, Kristin
Richey, Colleen
Vergyri, Dimitra
Wang, Wen
Yang, Mei
Zheng, Jing
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4809 - +
[25] Automatic Speech Segmentation for Automatic Speech Translation
Klosowski, Piotr
Dustor, Adam
COMPUTER NETWORKS, CN 2013, 2013, 370 : 466 - 475
[26] A Robust Context-Dependent Speech-to-Speech Phraselator Toolkit for Alexa
Rayner, Manny
Tsourakis, Nikos
Stanek, Jan
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 175 - 176
[27] Evaluation of 2-way Iraqi Arabic-English speech translation systems using automated metrics
Condon, Sherri
Arehart, Mark
Parvaz, Dan
Sanders, Gregory
Doran, Christy
Aberdeen, John
MACHINE TRANSLATION, 2012, 26 (1-2) : 159 - 176
[28] Enabling effective design of multimodal interfaces for speech-to-speech translation system: An empirical study of longitudinal user behaviors over time and user strategies for coping with errors
Shin, JongHo
Georgiou, Panayiotis G.
Narayanan, Shrikanth
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (02) : 554 - 571
[29] Automatic Speech-to-Speech Translation of Educational Videos Using SeamlessM4T and Its Use for Future VR Applications
Stefanel Gris, Lucas Rafael
Fernandes, Diogo
de Oliveira, Frederico Santos
Soares, Anderson
de Lima Soares, Telma Woerle
Galvao, Arlindo
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 163 - 166
[30] Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
Biadsy, Fadi
Weiss, Ron J.
Moreno, Pedro J.
Kanvesky, Dimitri
Jia, Ye
INTERSPEECH 2019, 2019, : 4115 - 4119

← 1 2 3 4 5 →