Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

被引：24

作者：

He, Mutian ^{[1
]}

Deng, Yan ^{[2
]}

He, Lei ^{[2
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

[2] Microsoft, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

sequence-to-sequence model; attention; speech synthesis;

D O I：

10.21437/Interspeech.2019-1972

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Neural TTS has demonstrated strong capabilities to generate human-like speech with high quality and naturalness, while its generalization to out-of-domain texts is still a challenging task, with regard to the design of attention-based sequence-to-sequence acoustic modeling. Various errors occur in those inputs with unseen context, including attention collapse, skipping, repeating, etc., which limits the broader applications. In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs. The method utilizes the strict monotonic property in TTS with constraints on monotonic hard attention that the alignments between inputs and outputs sequence must be not only monotonic but allowing no skipping on inputs. Soft attention could be used to evade mismatch between training and inference. The experimental results show that the proposed method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test.

引用

页码：1293 / 1297

页数：5

共 50 条

[21] Named Entity Transliteration with Sequence-to-Sequence Neural Network
Li, Zhongwei
Chng, Eng Siong
Li, Haizhou
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 374 - 378
[22] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Irie, Kazuki
Prabhavalkar, Rohit
Kannan, Anjuli
Bruguier, Antoine
Rybach, David
Nguyen, Patrick
INTERSPEECH 2019, 2019, : 3800 - 3804
[23] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
Shi, Tian
Keneshloo, Yaser
Ramakrishnan, Naren
Reddy, Chandan K.
ACM/IMS Transactions on Data Science, 2021, 2 (01):
[24] Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Libovicky, Jindrich
Helcl, Jindrich
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 196 - 202
[25] Bandit Structured Prediction for Neural Sequence-to-Sequence Learning
Kreutzer, Julia
Sokolov, Artem
Riezler, Stefan
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1503 - 1513
[26] Guiding Attention in Sequence-to-Sequence Models for Dialogue Act prediction
Colombo, Pierre
Chapuis, Emile
Manica, Matteo
Vignon, Emmanuel
Varni, Giovanna
Clavel, Chloe
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7594 - 7601
[27] Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources
Milintsevich, Kirill
Sirts, Kairit
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3112 - 3122
[28] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
Baro, Arnau
Badal, Carles
Fornes, Alicia
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210
[29] UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis
Zhou, Xiao
Ling, Zhen-Hua
Dai, Li-Rong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2643 - 2655
[30] INVESTIGATION OF AN INPUT SEQUENCE ON THAI NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
Janyoi, Pongsathon
Thangthai, Ausdang
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 218 - 223

← 1 2 3 4 5 →