FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

被引：0

作者：

Zhang, Jing-Xuan ^{[1
]}

Ling, Zhen-Hua ^{[1
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

国家重点研发计划;

关键词：

sequence-to-sequence model; encoder-decoder; attention; speech synthesis;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.

引用

页码：4789 / 4793

页数：5

共 50 条

[41] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
Shahamiri, Seyed Reza
Lal, Vanshika
Shah, Dhvani
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
[42] Foundations of Sequence-to-Sequence Modeling for Time Series
Kuznetsov, Vitaly
Mariet, Zelda
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 408 - 417
[43] ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
Palaskar, Shruti
Metze, Florian
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 397 - 404
[44] Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Yasuda, Yusuke
Wang, Xin
Yamagishi, Junichi
COMPUTER SPEECH AND LANGUAGE, 2021, 67
[45] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
Gong, Cheng
Wang, Longbiao
Ling, Zhenhua
Guo, Shaotong
Zhang, Ju
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
[46] Sequence-to-sequence modeling for graph representation learning
Aynaz Taheri
Kevin Gimpel
Tanya Berger-Wolf
Applied Network Science, 4
[47] SUPERVISED AND UNSUPERVISED APPROACHES FOR CONTROLLING NARROW LEXICAL FOCUS IN SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
Shechtman, Slava
Fernandez, Raul
Haws, David
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 431 - 437
[48] USING LOCAL PHRASE DEPENDENCY STRUCTURE INFORMATION IN NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
Kaiki, Nobuyoshi
Sakti, Sakriani
Nakamura, Satoshi
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 206 - 211
[49] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Weng, Chao
Cui, Jia
Wang, Guangsen
Wang, Jun
Yu, Changzhu
Su, Dan
Yu, Dong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 761 - 765
[50] Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Weiss, Ron J.
Chorowski, Jan
Jaitly, Navdeep
Wu, Yonghui
Chen, Zhifeng
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2625 - 2629

← 1 2 3 4 5 →