Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

被引:24
|
作者
He, Mutian [1 ]
Deng, Yan [2 ]
He, Lei [2 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
[2] Microsoft, Beijing, Peoples R China
来源
INTERSPEECH 2019 | 2019年
关键词
sequence-to-sequence model; attention; speech synthesis;
D O I
10.21437/Interspeech.2019-1972
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Neural TTS has demonstrated strong capabilities to generate human-like speech with high quality and naturalness, while its generalization to out-of-domain texts is still a challenging task, with regard to the design of attention-based sequence-to-sequence acoustic modeling. Various errors occur in those inputs with unseen context, including attention collapse, skipping, repeating, etc., which limits the broader applications. In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs. The method utilizes the strict monotonic property in TTS with constraints on monotonic hard attention that the alignments between inputs and outputs sequence must be not only monotonic but allowing no skipping on inputs. Soft attention could be used to evade mismatch between training and inference. The experimental results show that the proposed method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test.
引用
收藏
页码:1293 / 1297
页数:5
相关论文
共 50 条
  • [41] Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation
    Wang, Wenxuan
    Jiao, Wenxiang
    Hao, Yongchang
    Wang, Xing
    Shi, Shuming
    Tu, Zhaopeng
    Lyu, Michael R.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2591 - 2600
  • [42] Explainable sequence-to-sequence GRU neural network for pollution forecasting
    Sara Mirzavand Borujeni
    Leila Arras
    Vignesh Srinivasan
    Wojciech Samek
    Scientific Reports, 13
  • [43] Towards Sequence-to-Sequence Neural Model for Croatian Abstractive Summarization
    Davidovic, Vlatka
    Ipsic, Sanda Martincic
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS, CECIIS, 2023, : 309 - 315
  • [44] Graph augmented sequence-to-sequence model for neural question generation
    Hui Ma
    Jian Wang
    Hongfei Lin
    Bo Xu
    Applied Intelligence, 2023, 53 : 14628 - 14644
  • [45] Nonintrusive Load Monitoring based on Sequence-to-sequence Model With Attention Mechanism
    Wang K.
    Zhong H.
    Yu N.
    Xia Q.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2019, 39 (01): : 75 - 83
  • [46] Attention based sequence-to-sequence framework for auto image caption generation
    Khan, Rashid
    Islam, M. Shujah
    Kanwal, Khadija
    Iqbal, Mansoor
    Hossain, Md Imran
    Ye, Zhongfu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (01) : 159 - 170
  • [47] Explainable sequence-to-sequence GRU neural network for pollution forecasting
    Borujeni, Sara Mirzavand
    Arras, Leila
    Srinivasan, Vignesh
    Samek, Wojciech
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [48] FPGA implementation of sequence-to-sequence predicting spiking neural networks
    Ye, ChangMin
    Kornijcuk, Vladimir
    Kim, Jeeson
    Jeong, Doo Seok
    2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, : 322 - 323
  • [49] DIALOG STATE TRACKING WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE LEARNING
    Hori, Takaaki
    Wang, Hai
    Hori, Chiori
    Watanabe, Shinji
    Harsham, Bret
    Le Roux, Jonathan
    Hershey, John R.
    Koji, Yusuke
    Jing, Yi
    Zhu, Zhaocheng
    Aikawa, Takeyuki
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 552 - 558
  • [50] Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
    Novitasari, Sashi
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    INTERSPEECH 2019, 2019, : 3835 - 3839