Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

被引:0
|
作者
Tu, Tao [1 ]
Chen, Yuan-Jui [1 ]
Liu, Alexander H. [1 ]
Lee, Hung-yi [1 ]
机构
[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan
来源
INTERSPEECH 2020 | 2020年
关键词
multi-speaker speech synthesis; semi-supervised learning; discrete speech representation;
D O I
10.21437/Interspeech.2020-1824
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, end-to-end multi-speaker text-to-speech (TTS) systems gain success in the situation where a lot of high-quality speech plus their corresponding transcriptions are available. However, laborious paired data collection processes prevent many institutes from building multi-speaker TTS systems of great performance. In this work, we propose a semi-supervised learning approach for multi-speaker TTS. A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation. The experiment results demonstrate that with only an hour of paired speech data, whether the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices. We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy. In addition, our analysis reveals that different speaker characteristics of the paired data have an impact on the effectiveness of semi-supervised TTS.
引用
收藏
页码:3191 / 3195
页数:5
相关论文
共 50 条
  • [1] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
  • [2] LIGHTSPEECH: LIGHTWEIGHT NON-AUTOREGRESSIVE MULTI-SPEAKER TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 499 - 506
  • [3] Semi-Supervised Learning of Speech Sounds
    Jansen, Aren
    Niyogi, Partha
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2264 - 2267
  • [4] GRAPH CONVOLUTIONAL NETWORK BASED SEMI-SUPERVISED LEARNING ON MULTI-SPEAKER MEETING DATA
    Tong, Fuchuan
    Zheng, Siqi
    Zhang, Min
    Chen, Yafeng
    Suo, Hongbin
    Hong, Qingyang
    Li, Lin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6622 - 6626
  • [5] USING COLLECTIVE INFORMATION IN SEMI-SUPERVISED LEARNING FOR SPEECH RECOGNITION
    Varadarajan, Balakrishnan
    Yu, Dong
    Deng, Li
    Acero, Alex
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4633 - +
  • [6] INCREMENTAL SEMI-SUPERVISED LEARNING FOR MULTI-GENRE SPEECH RECOGNITION
    Khonglah, Banriskhem
    Madikeri, Srikanth
    Dey, Subhadeep
    Bourlard, Herve
    Motlicek, Petr
    Billa, Jayadev
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7419 - 7423
  • [7] Speaker Identification Using Semi-supervised Learning
    Fazakis, Nikos
    Karlos, Stamatis
    Kotsiantis, Sotiris
    Sgarbas, Kyriakos
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 389 - 396
  • [8] Semi-supervised speech activity detection with an application to automatic speaker verification
    Sholokhov, Alexey
    Sahidullah, Md
    Kinnunen, Tomi
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 132 - 156
  • [9] Speech Emotion Recognition Using Semi-supervised Learning with Ladder Networks
    Huang, Jian
    Li, Ya
    Tao, Jianhua
    Lian, Zheng
    Niu, Mingyue
    Yi, Jiangyan
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [10] SESQA: SEMI-SUPERVISED LEARNING FOR SPEECH QUALITY ASSESSMENT
    Serra, Joan
    Pons, Jordi
    Pascual, Santiago
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 381 - 385