Synthesizing Near Native-accented Speech for a Non-native Speaker by Imitating the Pronunciation and Prosody of a Native Speaker

被引:1
|
作者
Chung, Raymond [1 ,2 ]
Mak, Brian [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China
[2] Logist & Supply Chain MultiTech R&D Ctr, Pok Fu Lam, Hong Kong, Peoples R China
来源
关键词
text-to-speech; neural speech synthesis; accent conversion; FOREIGN ACCENT;
D O I
10.21437/Interspeech.2022-11124
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates how to reduce foreign accent in the synthesis of native (L1) speech for a non-native (L2) speaker. We focus on two major aspects of foreign accents: mispronunciations and improper prosody (rhythm, phonemes duration, and pauses). Firstly, to reduce mispronunciations, the mel-spectrograms generated by an L2 text-to-speech (TTS) model are fed to a pre-trained speech recognizer and the mispronunciation information is fed back to the TTS model during back-propagation to help the model learn correct native mel-spectrograms. Secondly, to imitate L1 speech prosody, a recent data augmentation (DA) technique originally proposed for speaking style transfer is applied to transfer L1 speaking style to L2 speakers. The DA technique creates additional L2 speeches when L2 speakers try to imitate L1 speeches. Automatic speech recognition on native-accented speeches synthesized from non-native speakers by the proposed method gives a lower word error rate. The speaker embeddings produced by a pre-trained speaker verifier from the original L2 speakers' speech and their synthesized speech are highly similar. Finally, subjective MOS scores on the synthesized speech show that they have good quality and reduced accentedness.
引用
收藏
页码:4302 / 4306
页数:5
相关论文
共 50 条
  • [41] NATIVE AND NON-NATIVE SPEECH PERCEPTION
    Williams, Daniel
    Escudero, Paola
    ACOUSTICS AUSTRALIA, 2014, 42 (02) : 79 - 83
  • [42] Language Teacher Identity, World Englishes, and ELF: A Duoethnography Between a "Native Speaker" Teacher and a "Non-Native Speaker" Teacher
    Kemaloglu-Er, Elif
    Lowe, Robert J.
    JOURNAL OF LANGUAGE IDENTITY AND EDUCATION, 2023, 22 (05): : 478 - 491
  • [43] Gender assignment to German nonsense nouns: What does the native speaker know that the non-native speaker doesn't?
    Levine, GS
    PROCEEDINGS OF THE 23RD ANNUAL BOSTON UNIVERSITY CONFERENCE ON LANGUAGE DEVELOPMENT, VOLS 1 AND 2, 1999, : 397 - 406
  • [44] Identification of multi-speaker Mandarin tones in noise by native and non-native listeners
    Lee, Chao-Yang
    Tao, Liang
    Bond, Z. S.
    SPEECH COMMUNICATION, 2010, 52 (11-12) : 900 - 910
  • [45] Speaker variability and context in the identification of fragmented Mandarin tones by native and non-native listeners
    Lee, Chao-Yang
    Tao, Liang
    Bond, Z. S.
    JOURNAL OF PHONETICS, 2009, 37 (01) : 1 - 15
  • [47] Improving Pronunciation Modeling for Non-Native Speech Recognition
    Tan, Tien-Ping
    Besacier, Laurent
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1801 - 1804
  • [48] Effects of speaker variability and noise on Mandarin fricative identification by native and non-native listeners
    Lee, Chao-Yang
    Zhang, Yu
    Li, Ximing
    Tao, Liang
    Bond, Z. S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (02): : 1130 - 1140
  • [49] NATIVE SPEAKER REACTION TO NONNATIVE SPEECH
    SCHAIRER, KE
    MODERN LANGUAGE JOURNAL, 1992, 76 (03): : 309 - 319
  • [50] The role of interaction in native speaker comprehension of nonnative speaker speech
    Polio, C
    Gass, SM
    MODERN LANGUAGE JOURNAL, 1998, 82 (03): : 308 - 319