A DNN-based emotional speech synthesis by speaker adaptation

被引:0
|
作者
Yang, Hongwu [1 ,2 ,3 ]
Zhang, Weizhao [1 ,2 ]
Zhi, Pengpeng [1 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730070, Peoples R China
[2] Engn Res Ctr Gansu Prov Intelligent Informat Tech, Lanzhou 730070, Peoples R China
[3] Natl & Prov Joint Engn Lab Learning Anal Technol, Lanzhou 730070, Peoples R China
来源
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2018年
基金
中国国家自然科学基金;
关键词
EXPRESSIONS; STYLES;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The paper proposes a deep neural network (DNN)-based emotional speech synthesis method to improve the quality of synthesized emotional speech by speaker adaptation with a multi-speaker and multi-emotion speech corpus. Firstly, a text analyzer is employed to obtain the contextual labels from sentences while the WORLD vocoder is used to extract the acoustic features from corresponding speeches. Then a set of speaker-independent DNN average voice models are trained with the contextual labels and acoustic features of multi-emotion speech corpus. Finally, the speaker adaptation is adopted to train a set of speaker-dependent DNN voice models of target emotion with target emotional training speeches. The target emotional speech is synthesized by the speaker-dependent DNN voice models. Subjective evaluations show that comparing with the traditional hidden Markov model (HMM)-based method, the proposed method can achieve higher opinion scores. Objective tests demonstrate that the spectrum of the emotional speech synthesized by the proposed method is also closer to the original speech than that of the emotional speech synthesized by the HMM-based method. Therefore, the proposed method can improve the emotion express and naturalness of synthesized emotional speech.
引用
收藏
页码:633 / 637
页数:5
相关论文
共 50 条
  • [1] A study of speaker adaptation for DNN-based speech synthesis
    Wu, Zhizheng
    Swietojanski, Pawel
    Veaux, Christophe
    Renals, Steve
    King, Simon
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
  • [2] Speaker adaptation in DNN-based speech synthesis using d-vectors
    Doddipatla, Rama
    Braunschweiler, Norbert
    Maia, Ranniery
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3404 - 3408
  • [3] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [4] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
  • [5] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [6] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
  • [7] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [8] DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS
    Ozturk, Mirac Goksu
    Ulusoy, Okan
    Demiroglu, Cenk
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7030 - 7034
  • [9] DNN-Based Arabic Speech Synthesis
    Amrouche, Aissa
    Bentrcia, Youssouf
    Boubakeur, Khadidja Nesrine
    Abed, Ahcene
    2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
  • [10] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544