FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

被引:0
|
作者
Zhang, Jing-Xuan [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
基金
国家重点研发计划;
关键词
sequence-to-sequence model; encoder-decoder; attention; speech synthesis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.
引用
收藏
页码:4789 / 4793
页数:5
相关论文
共 50 条
  • [41] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
    Shahamiri, Seyed Reza
    Lal, Vanshika
    Shah, Dhvani
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
  • [42] Foundations of Sequence-to-Sequence Modeling for Time Series
    Kuznetsov, Vitaly
    Mariet, Zelda
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 408 - 417
  • [43] ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
    Palaskar, Shruti
    Metze, Florian
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 397 - 404
  • [44] Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
    Yasuda, Yusuke
    Wang, Xin
    Yamagishi, Junichi
    COMPUTER SPEECH AND LANGUAGE, 2021, 67
  • [45] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
    Gong, Cheng
    Wang, Longbiao
    Ling, Zhenhua
    Guo, Shaotong
    Zhang, Ju
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
  • [46] Sequence-to-sequence modeling for graph representation learning
    Aynaz Taheri
    Kevin Gimpel
    Tanya Berger-Wolf
    Applied Network Science, 4
  • [47] SUPERVISED AND UNSUPERVISED APPROACHES FOR CONTROLLING NARROW LEXICAL FOCUS IN SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
    Shechtman, Slava
    Fernandez, Raul
    Haws, David
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 431 - 437
  • [48] USING LOCAL PHRASE DEPENDENCY STRUCTURE INFORMATION IN NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
    Kaiki, Nobuyoshi
    Sakti, Sakriani
    Nakamura, Satoshi
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 206 - 211
  • [49] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
    Weng, Chao
    Cui, Jia
    Wang, Guangsen
    Wang, Jun
    Yu, Changzhu
    Su, Dan
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 761 - 765
  • [50] Sequence-to-Sequence Models Can Directly Translate Foreign Speech
    Weiss, Ron J.
    Chorowski, Jan
    Jaitly, Navdeep
    Wu, Yonghui
    Chen, Zhifeng
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2625 - 2629