FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

被引：0

作者：

Zhang, Jing-Xuan ^{[1
]}

Ling, Zhen-Hua ^{[1
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

国家重点研发计划;

关键词：

sequence-to-sequence model; encoder-decoder; attention; speech synthesis;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.

引用

页码：4789 / 4793

页数：5

共 50 条

[21] SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODEL FOR SPEECH RECOGNITION
Dong, Linhao
Xu, Shuang
Xu, Bo
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5884 - 5888
[22] A Sequence-to-Sequence Model Based on Attention Mechanism for Wave Spectrum Prediction
Zeng, Xiao
Qi, Lin
Yi, Tong
Liu, Tong
2020 11TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2020,
[23] ON USING 2D SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Bahar, Parnia
Zeyer, Albert
Schlueter, Ralf
Ney, Hermann
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5671 - 5675
[24] DIALOG STATE TRACKING WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE LEARNING
Hori, Takaaki
Wang, Hai
Hori, Chiori
Watanabe, Shinji
Harsham, Bret
Le Roux, Jonathan
Hershey, John R.
Koji, Yusuke
Jing, Yi
Zhu, Zhaocheng
Aikawa, Takeyuki
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 552 - 558
[25] Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
Tuske, Zoltan
Saon, George
Audhkhasi, Kartik
Kingsbury, Brian
INTERSPEECH 2020, 2020, : 551 - 555
[26] Plasma confinement mode classification using a sequence-to-sequence neural network with attention
Matos, F.
Menkovski, V.
Pau, A.
Marceca, G.
Jenko, F.
NUCLEAR FUSION, 2021, 61 (04)
[27] Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization
Schymura, Christopher
Ochiai, Tsubasa
Delcroix, Marc
Kinoshita, Keisuke
Nakatani, Tomohiro
Araki, Shoko
Kolossa, Dorothea
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 231 - 235
[28] Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders
Okamoto, Takuma
Toda, Tomoki
Shiga, Yoshinori
Kawai, Hisashi
INTERSPEECH 2019, 2019, : 1308 - 1312
[29] Applying Syntax-Prosody Mapping Hypothesis and Boundary-Driven Theory to Neural Sequence-to-Sequence Speech Synthesis
Furukawa, Kei
Kishiyama, Takeshi
Nakamura, Satoshi
Sakti, Sakriani
IEEE ACCESS, 2024, 12 : 160896 - 160917
[30] MULTI-SCALE ALIGNMENT AND CONTEXTUAL HISTORY FOR ATTENTION MECHANISM IN SEQUENCE-TO-SEQUENCE MODEL
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 648 - 655

← 1 2 3 4 5 →