FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

被引:0
|
作者
Zhang, Jing-Xuan [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
基金
国家重点研发计划;
关键词
sequence-to-sequence model; encoder-decoder; attention; speech synthesis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.
引用
收藏
页码:4789 / 4793
页数:5
相关论文
共 50 条
  • [31] Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer
    Nakamura, Taiki
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2021, 2021, : 121 - 125
  • [32] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
    Mun'im, Raden Mu'az
    Inoue, Nakamasa
    Shinoda, Koichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6151 - 6155
  • [33] Double-attention mechanism of sequence-to-sequence deep neural networks for automatic speech recognition
    Yook, Dongsuk
    Lim, Dan
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 476 - 482
  • [34] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
  • [35] From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks
    Stappen, Lukas
    Karas, Vincent
    Cummins, Nicholas
    Ringeval, Fabien
    Scherer, Klaus
    Schuller, Bjorn
    2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,
  • [36] Enhancing Sequence-to-Sequence Text-to-Speech with Morphology
    Taylor, Jason
    Richmond, Korin
    INTERSPEECH 2020, 2020, : 1738 - 1742
  • [37] Sequence-to-Sequence Model with Attention for Time Series Classification
    Tang, Yujin
    Xu, Jianfeng
    Matsumoto, Kazunori
    Ono, Chihiro
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 503 - 510
  • [38] Sequence-to-sequence modeling for graph representation learning
    Taheri, Aynaz
    Gimpel, Kevin
    Berger-Wolf, Tanya
    APPLIED NETWORK SCIENCE, 2019, 4 (01)
  • [39] Prosodic Features Control by Symbols as Input of Sequence-to-Sequence Acoustic Modeling for Neural TTS
    Kurihara, Kiyoshi
    Seiyama, Nobumasa
    Kumano, Tadashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (02) : 302 - 311
  • [40] Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
    Karafiat, Martin
    Baskar, Murali Karthick
    Watanabe, Shinji
    Hori, Takaaki
    Wiesner, Matthew
    Cernocky, Jan Honza
    INTERSPEECH 2019, 2019, : 2220 - 2224