FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

被引：0

作者：

Zhang, Jing-Xuan ^{[1
]}

Ling, Zhen-Hua ^{[1
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

国家重点研发计划;

关键词：

sequence-to-sequence model; encoder-decoder; attention; speech synthesis;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.

引用

页码：4789 / 4793

页数：5

共 50 条

[31] Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer
Nakamura, Taiki
Koriyama, Tomoki
Saruwatari, Hiroshi
INTERSPEECH 2021, 2021, : 121 - 125
[32] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
Mun'im, Raden Mu'az
Inoue, Nakamasa
Shinoda, Koichi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6151 - 6155
[33] Double-attention mechanism of sequence-to-sequence deep neural networks for automatic speech recognition
Yook, Dongsuk
Lim, Dan
Yoo, In-Chul
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 476 - 482
[34] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Li, Qiujia
Zhang, Chao
Woodland, Philip C.
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
[35] From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks
Stappen, Lukas
Karas, Vincent
Cummins, Nicholas
Ringeval, Fabien
Scherer, Klaus
Schuller, Bjorn
2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,
[36] Enhancing Sequence-to-Sequence Text-to-Speech with Morphology
Taylor, Jason
Richmond, Korin
INTERSPEECH 2020, 2020, : 1738 - 1742
[37] Sequence-to-Sequence Model with Attention for Time Series Classification
Tang, Yujin
Xu, Jianfeng
Matsumoto, Kazunori
Ono, Chihiro
2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 503 - 510
[38] Sequence-to-sequence modeling for graph representation learning
Taheri, Aynaz
Gimpel, Kevin
Berger-Wolf, Tanya
APPLIED NETWORK SCIENCE, 2019, 4 (01)
[39] Prosodic Features Control by Symbols as Input of Sequence-to-Sequence Acoustic Modeling for Neural TTS
Kurihara, Kiyoshi
Seiyama, Nobumasa
Kumano, Tadashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (02) : 302 - 311
[40] Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
Karafiat, Martin
Baskar, Murali Karthick
Watanabe, Shinji
Hori, Takaaki
Wiesner, Matthew
Cernocky, Jan Honza
INTERSPEECH 2019, 2019, : 2220 - 2224

← 1 2 3 4 5 →