Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms

被引:0
作者
Zhao, Wei [1 ,2 ]
Xu, Li [1 ,2 ]
He, Ting [1 ,2 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Robot Inst, Yuyao 315400, Zhejiang, Peoples R China
来源
PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE | 2020年
关键词
speech synthesis; multi-speaker; Tacotron; speaker gating mechanisms;
D O I
10.23919/ccc50068.2020.9188779
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present two speaker gating mechanisms for multi-speaker Tacotron, a popular end-to-endtext-tospeech (TTS) neural system, to improve the performance of generating multiple voices. With our presented mechanisms, the model can work better in both generalization and accuracy. As a starting point, we introduce the original multi speaker Tacotron as a baseline model because of its excellent performance and straightforward structure. Employing gated linear units (GUIs), two different speaker gating mechanisms are then proposed for this model. Extensive experiments on VCTK dataset are conducted to demonstrate the validity of our methods. Conclusively, we find that it is promising to incorporate the speaker identity information by using the proposed speaker gating mechanisms.
引用
收藏
页码:7498 / 7503
页数:6
相关论文
共 16 条
[1]  
[Anonymous], 2017, P INT C LEARN REPR
[2]  
Arik SÖ, 2017, ADV NEUR IN, V30
[3]  
Arik SO, 2017, PR MACH LEARN RES, V70
[4]  
Christophe V., 2016, CSTR VCTK CORPUS ENG
[5]  
Chung J., 2014, NIPS WORKSH DEEP LEA
[6]  
Dauphin YN, 2017, PR MACH LEARN RES, V70
[7]  
Elkhouly E, 2016, IEEE PELS WORKSHOP ON EMERGING TECHNOLOGIES: WIRELESS POWER (2016 WOW), P1, DOI 10.1109/WoW.2016.7772056
[8]  
Gehring J, 2017, PR MACH LEARN RES, V70
[9]   SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243
[10]  
KUBICHEK RF, 1993, IEEE PACIF, P125, DOI 10.1109/PACRIM.1993.407206