Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer

被引:0
|
作者
Nakamura, Taiki [1 ]
Koriyama, Tomoki [1 ]
Saruwatari, Hiroshi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
来源
INTERSPEECH 2021 | 2021年
关键词
speech synthesis; deep Gaussian process; sequence-to-sequence; Bayesian deep model; sequential modeling;
D O I
10.21437/Interspeech.2021-896
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents a speech synthesis method based on deep Gaussian process (DGP) and sequence-to-sequence (Seq2Seq) learning toward high-quality end-to-end speech synthesis. Feed-forward and recurrent models using DGP are known to produce more natural synthetic speech than deep neural networks (DNNs) because of Bayesian learning and kernel regression. However, such DGP models consist of a pipeline architecture of independent models, acoustic and duration models, and require a high level of expertise in text processing. The proposed model is based on Seq2Seq learning, which enables a unified training of acoustic and duration models. The encoder and decoder layers are represented by Gaussian process regressions (GPRs) and the parameters are trained as a Bayesian model. We also propose a self-attention mechanism with Gaussian processes to effectively model character-level input in the encoder. The subjective evaluation results show that the proposed Seq2Seq-SA-DGP can synthesize more natural speech than DNNs with self-attention and recurrent structures. Besides, Seq2Seq-SA-DGP reduces the smoothing problems of recurrent structures and is effective when a simple input for an end-to-end system is given.
引用
收藏
页码:121 / 125
页数:5
相关论文
共 22 条
  • [1] FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4789 - 4793
  • [2] Sequence-to-Sequence Acoustic Modeling with Semi-Stepwise Monotonic Attention for Speech Synthesis
    Zhou, Xiao
    Ling, Zhenhua
    Hu, Yajun
    Dai, Lirong
    APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [3] Double-attention mechanism of sequence-to-sequence deep neural networks for automatic speech recognition
    Yook, Dongsuk
    Lim, Dan
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 476 - 482
  • [4] Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention
    Ezzine, Kadria
    Di Martino, Joseph
    Frikha, Mondher
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [5] Whisper to Normal Speech Conversion Using Sequence-to-Sequence Mapping Model With Auditory Attention
    Lian, Hailun
    Hu, Yuting
    Yu, Weiwei
    Zhou, Jian
    Zheng, Wenming
    IEEE ACCESS, 2019, 7 : 130495 - 130504
  • [6] SSS-AE: Anomaly Detection Using Self-Attention Based Sequence-to-Sequence Auto-Encoder in SMD Assembly Machine Sound
    Nam, Ki Hyun
    Song, Young Jong
    Yun, Il Dong
    IEEE ACCESS, 2021, 9 : 131191 - 131202
  • [7] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
    Gong, Cheng
    Wang, Longbiao
    Ling, Zhenhua
    Guo, Shaotong
    Zhang, Ju
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
  • [8] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
    Mun'im, Raden Mu'az
    Inoue, Nakamasa
    Shinoda, Koichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6151 - 6155
  • [9] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
  • [10] A Novel Deep-learning based Approach for Automatic Diacritization of Arabic Poems using Sequence-to-Sequence Model
    Mahmoud, Mohamed S.
    Negied, Nermin
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 42 - 46