Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

被引:24
|
作者
He, Mutian [1 ]
Deng, Yan [2 ]
He, Lei [2 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
[2] Microsoft, Beijing, Peoples R China
来源
INTERSPEECH 2019 | 2019年
关键词
sequence-to-sequence model; attention; speech synthesis;
D O I
10.21437/Interspeech.2019-1972
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Neural TTS has demonstrated strong capabilities to generate human-like speech with high quality and naturalness, while its generalization to out-of-domain texts is still a challenging task, with regard to the design of attention-based sequence-to-sequence acoustic modeling. Various errors occur in those inputs with unseen context, including attention collapse, skipping, repeating, etc., which limits the broader applications. In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs. The method utilizes the strict monotonic property in TTS with constraints on monotonic hard attention that the alignments between inputs and outputs sequence must be not only monotonic but allowing no skipping on inputs. Soft attention could be used to evade mismatch between training and inference. The experimental results show that the proposed method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test.
引用
收藏
页码:1293 / 1297
页数:5
相关论文
共 50 条
  • [21] Named Entity Transliteration with Sequence-to-Sequence Neural Network
    Li, Zhongwei
    Chng, Eng Siong
    Li, Haizhou
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 374 - 378
  • [22] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
    Irie, Kazuki
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Bruguier, Antoine
    Rybach, David
    Nguyen, Patrick
    INTERSPEECH 2019, 2019, : 3800 - 3804
  • [23] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
    Shi, Tian
    Keneshloo, Yaser
    Ramakrishnan, Naren
    Reddy, Chandan K.
    ACM/IMS Transactions on Data Science, 2021, 2 (01):
  • [24] Attention Strategies for Multi-Source Sequence-to-Sequence Learning
    Libovicky, Jindrich
    Helcl, Jindrich
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 196 - 202
  • [25] Bandit Structured Prediction for Neural Sequence-to-Sequence Learning
    Kreutzer, Julia
    Sokolov, Artem
    Riezler, Stefan
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1503 - 1513
  • [26] Guiding Attention in Sequence-to-Sequence Models for Dialogue Act prediction
    Colombo, Pierre
    Chapuis, Emile
    Manica, Matteo
    Vignon, Emmanuel
    Varni, Giovanna
    Clavel, Chloe
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7594 - 7601
  • [27] Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources
    Milintsevich, Kirill
    Sirts, Kairit
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3112 - 3122
  • [28] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
    Baro, Arnau
    Badal, Carles
    Fornes, Alicia
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210
  • [29] UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis
    Zhou, Xiao
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2643 - 2655
  • [30] INVESTIGATION OF AN INPUT SEQUENCE ON THAI NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
    Janyoi, Pongsathon
    Thangthai, Ausdang
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 218 - 223