ASSESSING EVALUATION METRICS FOR SPEECH-TO-SPEECH TRANSLATION

被引:4
|
作者
Salesky, Elizabeth [1 ]
Maeder, Julian [2 ]
Klinger, Severin [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Swiss Fed Inst Technol, Zurich, Switzerland
来源
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年
关键词
evaluation; speech synthesis; speech translation; speech-to-speech; dialects;
D O I
10.1109/ASRU51503.2021.9688073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-to-speech translation combines machine translation with speech synthesis, introducing evaluation challenges not present in either task alone. How to automatically evaluate speech-to-speech translation is an open question which has not previously been explored. Translating to speech rather than to text is often motivated by unwritten languages or languages without standardized orthographies. However, we show that the previously used automatic metric for this task is best equipped for standardized high-resource languages only. In this work, we first evaluate current metrics for speech-to-speech translation, and second assess how translation to dialectal variants rather than to standardized languages impacts various evaluation methods.
引用
收藏
页码:733 / 740
页数:8
相关论文
共 50 条
  • [21] Deriving phonetic transcriptions and discovering word segmentations for speech-to-speech translation in low-resource settings
    Wilkinson, Andrew
    Zhao, Tiancheng
    Black, Alan W.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3086 - 3090
  • [22] Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
    Hattori, Nobuhiko
    Toda, Tomoki
    Kawai, Hisashi
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2780 - +
  • [23] Streaming Parrotron for on-device speech-to-speech conversion
    Rybakov, Oleg
    Biadsy, Fadi
    Zhang, Xia
    Jiang, Liyang
    Meadowlark, Phoenix
    Agrawal, Shivani
    INTERSPEECH 2023, 2023, : 2033 - 2037
  • [24] RECENT ADVANCES IN SRI'S IRAQCOMM™ IRAQI ARABIC-ENGLISH SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Akbacak, Murat
    Franco, Horacio
    Frandsen, Michael
    Hasan, Sasa
    Jameel, Huda
    Kathol, Andreas
    Khadivi, Shahram
    Lei, Xin
    Mandal, Arindam
    Mansour, Saab
    Precoda, Kristin
    Richey, Colleen
    Vergyri, Dimitra
    Wang, Wen
    Yang, Mei
    Zheng, Jing
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4809 - +
  • [25] Automatic Speech Segmentation for Automatic Speech Translation
    Klosowski, Piotr
    Dustor, Adam
    COMPUTER NETWORKS, CN 2013, 2013, 370 : 466 - 475
  • [26] A Robust Context-Dependent Speech-to-Speech Phraselator Toolkit for Alexa
    Rayner, Manny
    Tsourakis, Nikos
    Stanek, Jan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 175 - 176
  • [27] Evaluation of 2-way Iraqi Arabic-English speech translation systems using automated metrics
    Condon, Sherri
    Arehart, Mark
    Parvaz, Dan
    Sanders, Gregory
    Doran, Christy
    Aberdeen, John
    MACHINE TRANSLATION, 2012, 26 (1-2) : 159 - 176
  • [28] Enabling effective design of multimodal interfaces for speech-to-speech translation system: An empirical study of longitudinal user behaviors over time and user strategies for coping with errors
    Shin, JongHo
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (02) : 554 - 571
  • [29] Automatic Speech-to-Speech Translation of Educational Videos Using SeamlessM4T and Its Use for Future VR Applications
    Stefanel Gris, Lucas Rafael
    Fernandes, Diogo
    de Oliveira, Frederico Santos
    Soares, Anderson
    de Lima Soares, Telma Woerle
    Galvao, Arlindo
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 163 - 166
  • [30] Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
    Biadsy, Fadi
    Weiss, Ron J.
    Moreno, Pedro J.
    Kanvesky, Dimitri
    Jia, Ye
    INTERSPEECH 2019, 2019, : 4115 - 4119