FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

被引:0
作者
Zhang, Jing-Xuan [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
基金
国家重点研发计划;
关键词
sequence-to-sequence model; encoder-decoder; attention; speech synthesis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.
引用
收藏
页码:4789 / 4793
页数:5
相关论文
共 50 条
  • [21] SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODEL FOR SPEECH RECOGNITION
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5884 - 5888
  • [22] A Sequence-to-Sequence Model Based on Attention Mechanism for Wave Spectrum Prediction
    Zeng, Xiao
    Qi, Lin
    Yi, Tong
    Liu, Tong
    2020 11TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2020,
  • [23] ON USING 2D SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Bahar, Parnia
    Zeyer, Albert
    Schlueter, Ralf
    Ney, Hermann
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5671 - 5675
  • [24] DIALOG STATE TRACKING WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE LEARNING
    Hori, Takaaki
    Wang, Hai
    Hori, Chiori
    Watanabe, Shinji
    Harsham, Bret
    Le Roux, Jonathan
    Hershey, John R.
    Koji, Yusuke
    Jing, Yi
    Zhu, Zhaocheng
    Aikawa, Takeyuki
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 552 - 558
  • [25] Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
    Tuske, Zoltan
    Saon, George
    Audhkhasi, Kartik
    Kingsbury, Brian
    INTERSPEECH 2020, 2020, : 551 - 555
  • [26] Plasma confinement mode classification using a sequence-to-sequence neural network with attention
    Matos, F.
    Menkovski, V.
    Pau, A.
    Marceca, G.
    Jenko, F.
    NUCLEAR FUSION, 2021, 61 (04)
  • [27] Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization
    Schymura, Christopher
    Ochiai, Tsubasa
    Delcroix, Marc
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Araki, Shoko
    Kolossa, Dorothea
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 231 - 235
  • [28] Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders
    Okamoto, Takuma
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    INTERSPEECH 2019, 2019, : 1308 - 1312
  • [29] Applying Syntax-Prosody Mapping Hypothesis and Boundary-Driven Theory to Neural Sequence-to-Sequence Speech Synthesis
    Furukawa, Kei
    Kishiyama, Takeshi
    Nakamura, Satoshi
    Sakti, Sakriani
    IEEE ACCESS, 2024, 12 : 160896 - 160917
  • [30] MULTI-SCALE ALIGNMENT AND CONTEXTUAL HISTORY FOR ATTENTION MECHANISM IN SEQUENCE-TO-SEQUENCE MODEL
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 648 - 655