Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer

被引：0

作者：

Nakamura, Taiki ^{[1
]}

Koriyama, Tomoki ^{[1
]}

Saruwatari, Hiroshi ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech synthesis; deep Gaussian process; sequence-to-sequence; Bayesian deep model; sequential modeling;

D O I：

10.21437/Interspeech.2021-896

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper presents a speech synthesis method based on deep Gaussian process (DGP) and sequence-to-sequence (Seq2Seq) learning toward high-quality end-to-end speech synthesis. Feed-forward and recurrent models using DGP are known to produce more natural synthetic speech than deep neural networks (DNNs) because of Bayesian learning and kernel regression. However, such DGP models consist of a pipeline architecture of independent models, acoustic and duration models, and require a high level of expertise in text processing. The proposed model is based on Seq2Seq learning, which enables a unified training of acoustic and duration models. The encoder and decoder layers are represented by Gaussian process regressions (GPRs) and the parameters are trained as a Bayesian model. We also propose a self-attention mechanism with Gaussian processes to effectively model character-level input in the encoder. The subjective evaluation results show that the proposed Seq2Seq-SA-DGP can synthesize more natural speech than DNNs with self-attention and recurrent structures. Besides, Seq2Seq-SA-DGP reduces the smoothing problems of recurrent structures and is effective when a simple input for an end-to-end system is given.

引用

页码：121 / 125

页数：5

共 22 条

[1] FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS
Zhang, Jing-Xuan
Ling, Zhen-Hua
Dai, Li-Rong
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4789 - 4793
[2] Sequence-to-Sequence Acoustic Modeling with Semi-Stepwise Monotonic Attention for Speech Synthesis
Zhou, Xiao
Ling, Zhenhua
Hu, Yajun
Dai, Lirong
APPLIED SCIENCES-BASEL, 2021, 11 (21):
[3] Double-attention mechanism of sequence-to-sequence deep neural networks for automatic speech recognition
Yook, Dongsuk
Lim, Dan
Yoo, In-Chul
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 476 - 482
[4] Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention
Ezzine, Kadria
Di Martino, Joseph
Frikha, Mondher
APPLIED SCIENCES-BASEL, 2022, 12 (14):
[5] Whisper to Normal Speech Conversion Using Sequence-to-Sequence Mapping Model With Auditory Attention
Lian, Hailun
Hu, Yuting
Yu, Weiwei
Zhou, Jian
Zheng, Wenming
IEEE ACCESS, 2019, 7 : 130495 - 130504
[6] SSS-AE: Anomaly Detection Using Self-Attention Based Sequence-to-Sequence Auto-Encoder in SMD Assembly Machine Sound
Nam, Ki Hyun
Song, Young Jong
Yun, Il Dong
IEEE ACCESS, 2021, 9 : 131191 - 131202
[7] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
Gong, Cheng
Wang, Longbiao
Ling, Zhenhua
Guo, Shaotong
Zhang, Ju
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
[8] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
Mun'im, Raden Mu'az
Inoue, Nakamasa
Shinoda, Koichi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6151 - 6155
[9] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Li, Qiujia
Zhang, Chao
Woodland, Philip C.
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
[10] A Novel Deep-learning based Approach for Automatic Diacritization of Arabic Poems using Sequence-to-Sequence Model
Mahmoud, Mohamed S.
Negied, Nermin
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 42 - 46

← 1 2 3 →