Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

被引:24
|
作者
He, Mutian [1 ]
Deng, Yan [2 ]
He, Lei [2 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
[2] Microsoft, Beijing, Peoples R China
来源
INTERSPEECH 2019 | 2019年
关键词
sequence-to-sequence model; attention; speech synthesis;
D O I
10.21437/Interspeech.2019-1972
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Neural TTS has demonstrated strong capabilities to generate human-like speech with high quality and naturalness, while its generalization to out-of-domain texts is still a challenging task, with regard to the design of attention-based sequence-to-sequence acoustic modeling. Various errors occur in those inputs with unseen context, including attention collapse, skipping, repeating, etc., which limits the broader applications. In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs. The method utilizes the strict monotonic property in TTS with constraints on monotonic hard attention that the alignments between inputs and outputs sequence must be not only monotonic but allowing no skipping on inputs. Soft attention could be used to evade mismatch between training and inference. The experimental results show that the proposed method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test.
引用
收藏
页码:1293 / 1297
页数:5
相关论文
共 50 条
  • [31] Sequential classification of customer behavior based on sequence-to-sequence learning with gated-attention neural networks
    Licheng Zhao
    Yi Zuo
    Katsutoshi Yada
    Advances in Data Analysis and Classification, 2023, 17 : 549 - 581
  • [32] Sequential classification of customer behavior based on sequence-to-sequence learning with gated-attention neural networks
    Zhao, Licheng
    Zuo, Yi
    Yada, Katsutoshi
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (03) : 549 - 581
  • [33] Sequence-to-Sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents
    Bruguier, Antoine
    Zen, Heiga
    Arkhangorodsky, Arkady
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1284 - 1287
  • [34] De-duping URLs with Sequence-to-Sequence Neural Networks
    Xu, Keyang
    Liu, Zhengzhong
    Callan, Jamie
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1157 - 1160
  • [35] A Convolutional Sequence-to-Sequence Attention Fusion Framework for Commonsense Causal Reasoning
    Luo, Zhiyi
    Liu, Yizhu
    Luo, Shuyun
    MATHEMATICS, 2023, 11 (23)
  • [36] Graph augmented sequence-to-sequence model for neural question generation
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Xu, Bo
    APPLIED INTELLIGENCE, 2023, 53 (11) : 14628 - 14644
  • [37] Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models
    Liu, Bowen
    Ramsundar, Bharath
    Kawthekar, Prasad
    Shi, Jade
    Gomes, Joseph
    Quang Luu Nguyen
    Ho, Stephen
    Sloane, Jack
    Wender, Paul
    Pande, Vijay
    ACS CENTRAL SCIENCE, 2017, 3 (10) : 1103 - 1113
  • [38] A Sequence-to-Sequence Model Based on Attention Mechanism for Wave Spectrum Prediction
    Zeng, Xiao
    Qi, Lin
    Yi, Tong
    Liu, Tong
    2020 11TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2020,
  • [39] Proactive Mobility Management of UEs using Sequence-to-Sequence Modeling
    Yajnanarayana, Vijaya
    2022 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2022, : 320 - 325
  • [40] Attention-Based Sequence-to-Sequence Model for Time Series Imputation
    Li, Yurui
    Du, Mingjing
    He, Sheng
    ENTROPY, 2022, 24 (12)