Stress Transfer in Speech-to-Speech Machine Translation

被引：0

作者：

Akarsh, Sai ^{[1
]}

Narasinga, Vamshiraghusimha ^{[1
]}

Vuppala, Anil Kumar ^{[1
]}

机构：

[1] Int Inst Informat Technol Hyderabad, Hyderabad, India

来源：

INTERSPEECH 2024 | 2024年

关键词：

speech-to-speech machine translation; stress detection; text-to-speech; speech synthesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

India's education sector faces a significant challenge due to its linguistic diversity, hindering inclusivity. The dominance of English on the internet underscores the critical need for translating educational content into Indian languages to enhance accessibility. Although Speech-to-Speech Machine Translation (SSMT) technologies exist, their deficiency in reproducing intonation results in monotonous translations, diminishing audience engagement and interest in the content. To address this issue, this paper demonstrates an SSMT application with a Text-to-Speech (TTS) architecture capable of incorporating stress into synthesized speech to give a more engaging experience. The SSMT pipeline also has components like a stress classifier that captures the stress in the source speech and allows it to be utilized during speech generation. The application takes in a speech file and gives a translated speech file with stress transferred from the source.

引用

页码：995 / 996

页数：2

共 6 条

[1]

Baby A., 2016, Resources for In-dian language, V09

[2] WhisperX: Time-Accurate Speech Transcription of Long-Form Audio [J].

Bain, Max ;

Huh, Jaesung ;

Han, Tengda ;

Zisserman, Andrew .

INTERSPEECH 2023, 2023, :4489-4493

[3]

Garg S., 2021, 2021 NATL C COMMUN, P1

[4]

Jalili Sabet M., P 2020 C EMPIRICAL

[5] FASTPITCH: PARALLEL TEXT-TO-SPEECH WITH PITCH PREDICTION [J].

Lancucki, Adrian .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6588-6592

[6]

Tiedemann J, 2020, P 22 AN NUAL CONFE

← 1 →