Zero-shot multi-speaker accent TTS with limited accent data

被引：0

作者：

Zhang, Mingyang ^{[1
]}

Zhou, Yi ^{[2
]}

Wu, Zhizheng ^{[1
]}

Li, Haizhou ^{[1
,2
]}

机构：

[1] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Sch Data Sci, Shenzhen, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

来源：

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年

基金：

中国国家自然科学基金;

关键词：

SPEAKER ADAPTATION;

D O I：

10.1109/APSIPAASC58517.2023.10317526

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a multi-speaker accent speech synthesis framework. It can generate accented speech of unseen speakers using only a limited amount of accent training data. Without relying on the accent lexicon, the proposed network is able to learn the accent phoneme embedding via a simple model adaptation. In specific, a standard multi-speaker speech synthesis is first trained with native speech. Then, an additional neural network module is appended for adaptation to map the native speech to the accented speech. In the experiments, we have synthesized English speech with Singapore and Hindi accents. Both objective and subjective evaluation results successfully confirm that our proposed technique with phoneme mapping is effective to generate high-quality accent speech for unseen speakers.

引用

页码：1931 / 1936

页数：6

共 28 条

[1] Anumanchipalli G. K., 2011, WORKSH VER LARG SCAL, P70
[2] Arik SÖ, 2018, ADV NEUR IN, V31
[3] The contribution of prosody to the perception of foreign accent
Boula de Mareuil, Philippe
Vieru-Dimulescu, Bianca
[J]. PHONETICA, 2006, 63 (04) : 247 - 267
[4] Chorowski J, 2015, ADV NEUR IN, V28
[5] Deng Y, 2019, Arxiv, DOI arXiv:1812.05253
[6] Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning
Ding, Shaojin
Zhao, Guanlong
Gutierrez-Osuna, Ricardo
[J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
[7] Donahue J., 2021, ICLR, P2021
[8] Fan YC, 2015, INT CONF ACOUST SPEE, P4475, DOI 10.1109/ICASSP.2015.7178817
[9] Jia Y, 2018, ADV NEUR IN, V31
[10] Kayte S., 2015, International Journal of Computer Applications, V130, P38

← 1 2 3 →