Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

被引：24

作者：

He, Mutian ^{[1
]}

Deng, Yan ^{[2
]}

He, Lei ^{[2
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

[2] Microsoft, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

sequence-to-sequence model; attention; speech synthesis;

D O I：

10.21437/Interspeech.2019-1972

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Neural TTS has demonstrated strong capabilities to generate human-like speech with high quality and naturalness, while its generalization to out-of-domain texts is still a challenging task, with regard to the design of attention-based sequence-to-sequence acoustic modeling. Various errors occur in those inputs with unseen context, including attention collapse, skipping, repeating, etc., which limits the broader applications. In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs. The method utilizes the strict monotonic property in TTS with constraints on monotonic hard attention that the alignments between inputs and outputs sequence must be not only monotonic but allowing no skipping on inputs. Soft attention could be used to evade mismatch between training and inference. The experimental results show that the proposed method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test.

引用

页码：1293 / 1297

页数：5

共 50 条

[31] Sequential classification of customer behavior based on sequence-to-sequence learning with gated-attention neural networks
Licheng Zhao
Yi Zuo
Katsutoshi Yada
Advances in Data Analysis and Classification, 2023, 17 : 549 - 581
[32] Sequential classification of customer behavior based on sequence-to-sequence learning with gated-attention neural networks
Zhao, Licheng
Zuo, Yi
Yada, Katsutoshi
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (03) : 549 - 581
[33] Sequence-to-Sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents
Bruguier, Antoine
Zen, Heiga
Arkhangorodsky, Arkady
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1284 - 1287
[34] De-duping URLs with Sequence-to-Sequence Neural Networks
Xu, Keyang
Liu, Zhengzhong
Callan, Jamie
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1157 - 1160
[35] A Convolutional Sequence-to-Sequence Attention Fusion Framework for Commonsense Causal Reasoning
Luo, Zhiyi
Liu, Yizhu
Luo, Shuyun
MATHEMATICS, 2023, 11 (23)
[36] Graph augmented sequence-to-sequence model for neural question generation
Ma, Hui
Wang, Jian
Lin, Hongfei
Xu, Bo
APPLIED INTELLIGENCE, 2023, 53 (11) : 14628 - 14644
[37] Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models
Liu, Bowen
Ramsundar, Bharath
Kawthekar, Prasad
Shi, Jade
Gomes, Joseph
Quang Luu Nguyen
Ho, Stephen
Sloane, Jack
Wender, Paul
Pande, Vijay
ACS CENTRAL SCIENCE, 2017, 3 (10) : 1103 - 1113
[38] A Sequence-to-Sequence Model Based on Attention Mechanism for Wave Spectrum Prediction
Zeng, Xiao
Qi, Lin
Yi, Tong
Liu, Tong
2020 11TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2020,
[39] Proactive Mobility Management of UEs using Sequence-to-Sequence Modeling
Yajnanarayana, Vijaya
2022 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2022, : 320 - 325
[40] Attention-Based Sequence-to-Sequence Model for Time Series Imputation
Li, Yurui
Du, Mingjing
He, Sheng
ENTROPY, 2022, 24 (12)

← 1 2 3 4 5 →