ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks

被引:0
作者
Jia, Xiaoqi [1 ,2 ,3 ,4 ]
Tai, Jianwei [1 ,2 ,3 ,4 ]
Zhou, Hang [1 ,2 ,3 ,4 ]
Li, Yakai [1 ,2 ,3 ,4 ]
Zhang, Weijuan [1 ,2 ,3 ,4 ]
Du, Haichao [1 ,2 ,3 ,4 ]
Huang, Qingjia [1 ,2 ,3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Informa Tion Engn, Key Lab Network Assessment Technol, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing Key Lab Network Secur & Protect Technol, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年 / 325卷
基金
中国国家自然科学基金;
关键词
ACOUSTIC MODEL; SPEECH; CONVERSION;
D O I
10.3233/FAIA200325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the remarkable progress made in synthesizing emotional speech from text, it is still challenging to provide emotion information to existing speech segments. Previous methods mainly rely on parallel data, and few works have studied the generalization ability for one model to transfer emotion information across different languages. To cope with such problems, we propose an emotion transfer system named ET-GAN, for learning language-independent emotion transfer from one emotion to another without parallel training samples. Based on cycle-consistent generative adversarial network, our method ensures the transfer of only emotion information across speeches with simple loss designs. Besides, we introduce an approach for migrating emotion information across different languages by using transfer learning. The experiment results show that our method can efficiently generate high-quality emotional speech for any given emotion category, without aligned speech pairs.
引用
收藏
页码:2038 / 2045
页数:8
相关论文
共 38 条
  • [1] Aihara R, 2012, American Journal of Signal Processing, V2, P134
  • [2] [Anonymous], 2017, INT C LEARN REPR
  • [3] [Anonymous], 2018, P 35 INT C MACH LEAR
  • [4] [Anonymous], 2017, Neural Information Processing Systems(NIPS) 2017
  • [5] [Anonymous], 2017, BMVC
  • [6] [Anonymous], P 2017 ASI PAC SIGN
  • [7] Arjovsky M, 2017, PR MACH LEARN RES, V70
  • [8] Burkhardt F., 2005, P INTERSPEECH 2005, P1517
  • [9] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [10] Chiba Y, 2018, 19TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2018), P371