Synthesizing Near Native-accented Speech for a Non-native Speaker by Imitating the Pronunciation and Prosody of a Native Speaker

被引：1

作者：

Chung, Raymond ^{[1
,2
]}

Mak, Brian ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China

[2] Logist & Supply Chain MultiTech R&D Ctr, Pok Fu Lam, Hong Kong, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

text-to-speech; neural speech synthesis; accent conversion; FOREIGN ACCENT;

D O I：

10.21437/Interspeech.2022-11124

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper investigates how to reduce foreign accent in the synthesis of native (L1) speech for a non-native (L2) speaker. We focus on two major aspects of foreign accents: mispronunciations and improper prosody (rhythm, phonemes duration, and pauses). Firstly, to reduce mispronunciations, the mel-spectrograms generated by an L2 text-to-speech (TTS) model are fed to a pre-trained speech recognizer and the mispronunciation information is fed back to the TTS model during back-propagation to help the model learn correct native mel-spectrograms. Secondly, to imitate L1 speech prosody, a recent data augmentation (DA) technique originally proposed for speaking style transfer is applied to transfer L1 speaking style to L2 speakers. The DA technique creates additional L2 speeches when L2 speakers try to imitate L1 speeches. Automatic speech recognition on native-accented speeches synthesized from non-native speakers by the proposed method gives a lower word error rate. The speaker embeddings produced by a pre-trained speaker verifier from the original L2 speakers' speech and their synthesized speech are highly similar. Finally, subjective MOS scores on the synthesized speech show that they have good quality and reduced accentedness.

引用

页码：4302 / 4306

页数：5

共 50 条

[41] NATIVE AND NON-NATIVE SPEECH PERCEPTION
Williams, Daniel
Escudero, Paola
ACOUSTICS AUSTRALIA, 2014, 42 (02) : 79 - 83
[42] Language Teacher Identity, World Englishes, and ELF: A Duoethnography Between a "Native Speaker" Teacher and a "Non-Native Speaker" Teacher
Kemaloglu-Er, Elif
Lowe, Robert J.
JOURNAL OF LANGUAGE IDENTITY AND EDUCATION, 2023, 22 (05): : 478 - 491
[43] Gender assignment to German nonsense nouns: What does the native speaker know that the non-native speaker doesn't?
Levine, GS
PROCEEDINGS OF THE 23RD ANNUAL BOSTON UNIVERSITY CONFERENCE ON LANGUAGE DEVELOPMENT, VOLS 1 AND 2, 1999, : 397 - 406
[44] Identification of multi-speaker Mandarin tones in noise by native and non-native listeners
Lee, Chao-Yang
Tao, Liang
Bond, Z. S.
SPEECH COMMUNICATION, 2010, 52 (11-12) : 900 - 910
[45] Speaker variability and context in the identification of fragmented Mandarin tones by native and non-native listeners
Lee, Chao-Yang
Tao, Liang
Bond, Z. S.
JOURNAL OF PHONETICS, 2009, 37 (01) : 1 - 15
[46] Delayed next turn repair initiation in native/non-native speaker English conversation
Wong, J
APPLIED LINGUISTICS, 2000, 21 (02) : 244 - 267
[47] Improving Pronunciation Modeling for Non-Native Speech Recognition
Tan, Tien-Ping
Besacier, Laurent
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1801 - 1804
[48] Effects of speaker variability and noise on Mandarin fricative identification by native and non-native listeners
Lee, Chao-Yang
Zhang, Yu
Li, Ximing
Tao, Liang
Bond, Z. S.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (02): : 1130 - 1140
[49] NATIVE SPEAKER REACTION TO NONNATIVE SPEECH
SCHAIRER, KE
MODERN LANGUAGE JOURNAL, 1992, 76 (03): : 309 - 319
[50] The role of interaction in native speaker comprehension of nonnative speaker speech
Polio, C
Gass, SM
MODERN LANGUAGE JOURNAL, 1998, 82 (03): : 308 - 319

← 1 2 3 4 5 →