A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization

被引:1
作者
Cheon, Sung Jun [1 ,2 ]
Choi, Byoung Jin [1 ,2 ]
Kim, Minchan [1 ,2 ]
Lee, Hyeonseung [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
关键词
Training; Upper bound; Speech synthesis; Correlation; Mutual information; Synthesizers; Estimation; Disentanglement; mutual information; speech synthesis; style modeling; total correlation;
D O I
10.1109/LSP.2021.3125259
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, we propose a multivariate information minimization method that disentangles three or more latent representations. We show that control factors can be disentangled by minimizing interactive dependency, which can be expressed as a sum of mutual information upper bound terms. Since the upper bound estimate converges from the early training stage, there is little performance degradation due to auxiliary loss. The proposed technique is applied to train a text-to-speech synthesizer with multi-lingual, multi-speaker, and multi-style corpora. Subjective listening tests validate that the proposed method can improve the synthesizer in terms of quality as well as controllability.
引用
收藏
页码:55 / 59
页数:5
相关论文
共 36 条
[1]  
[Anonymous], 1996, METH SUBJ DET TRANSM
[2]  
[Anonymous], 2018, CoRR, abs/1812.02230
[3]  
[Anonymous], 2015, 15343 ITUTBS, P1534
[4]  
Belghazi MI, 2018, PR MACH LEARN RES, V80
[5]  
Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
[6]  
Cheng P., 2020, P C NEUR INF PROC SY
[7]  
Cheng PY, 2020, PR MACH LEARN RES, V119
[8]   Gated Recurrent Attention for Multi-Style Speech Synthesis [J].
Cheon, Sung Jun ;
Lee, Joun Yeop ;
Choi, Byoung Jin ;
Lee, Hyeonseung ;
Kim, Nam Soo .
APPLIED SCIENCES-BASEL, 2020, 10 (15)
[9]  
Cho K., 2014, COMPUT SCI
[10]  
Cong J., P INTERSPEECH, V2021, P2182