Zero-shot multi-speaker accent TTS with limited accent data

被引:0
作者
Zhang, Mingyang [1 ]
Zhou, Yi [2 ]
Wu, Zhizheng [1 ]
Li, Haizhou [1 ,2 ]
机构
[1] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Sch Data Sci, Shenzhen, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
来源
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年
基金
中国国家自然科学基金;
关键词
SPEAKER ADAPTATION;
D O I
10.1109/APSIPAASC58517.2023.10317526
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a multi-speaker accent speech synthesis framework. It can generate accented speech of unseen speakers using only a limited amount of accent training data. Without relying on the accent lexicon, the proposed network is able to learn the accent phoneme embedding via a simple model adaptation. In specific, a standard multi-speaker speech synthesis is first trained with native speech. Then, an additional neural network module is appended for adaptation to map the native speech to the accented speech. In the experiments, we have synthesized English speech with Singapore and Hindi accents. Both objective and subjective evaluation results successfully confirm that our proposed technique with phoneme mapping is effective to generate high-quality accent speech for unseen speakers.
引用
收藏
页码:1931 / 1936
页数:6
相关论文
共 28 条
  • [1] Anumanchipalli G. K., 2011, WORKSH VER LARG SCAL, P70
  • [2] Arik SÖ, 2018, ADV NEUR IN, V31
  • [3] The contribution of prosody to the perception of foreign accent
    Boula de Mareuil, Philippe
    Vieru-Dimulescu, Bianca
    [J]. PHONETICA, 2006, 63 (04) : 247 - 267
  • [4] Chorowski J, 2015, ADV NEUR IN, V28
  • [5] Deng Y, 2019, Arxiv, DOI arXiv:1812.05253
  • [6] Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning
    Ding, Shaojin
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [7] Donahue J., 2021, ICLR, P2021
  • [8] Fan YC, 2015, INT CONF ACOUST SPEE, P4475, DOI 10.1109/ICASSP.2015.7178817
  • [9] Jia Y, 2018, ADV NEUR IN, V31
  • [10] Kayte S., 2015, International Journal of Computer Applications, V130, P38