Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

被引:18
作者
Park, Seung-won [1 ,2 ]
Kim, Doo-young [1 ,2 ]
Joe, Myun-chul [2 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
[2] MINDs Lab Inc, Seoul, South Korea
来源
INTERSPEECH 2020 | 2020年
关键词
voice conversion; speech synthesis; speech representation; disentangled representation;
D O I
10.21437/Interspeech.2020-1542
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation. Cotatron is based on the multispeaker TTS architecture and can be trained with conventional TTS datasets. We train a voice conversion system to reconstruct speech with Cotatron features, which is similar to the previous methods based on Phonetic Posteriorgram (PPG). By training and evaluating our system with 108 speakers from the VCTK dataset, we outperform the previous method in terms of both naturalness and speaker similarity. Our system can also convert speech from speakers that are unseen during training, and utilize ASR to automate the transcription with minimal reduction of the performance. Audio samples are available at https://mindslab- ai.github. io/cotatron, and the code with a pre-trained model will be made available soon.
引用
收藏
页码:4696 / 4700
页数:5
相关论文
共 35 条
  • [1] Battenberg E., 2019, ARXIV191010288
  • [2] Bi M., 2020, INT C LEARN REPR ICL
  • [3] Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
    Biadsy, Fadi
    Weiss, Ron J.
    Moreno, Pedro J.
    Kanvesky, Dimitri
    Jia, Ye
    [J]. INTERSPEECH 2019, 2019, : 4115 - 4119
  • [4] One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
    Chou, Ju-chieh
    Lee, Hung-Yi
    [J]. INTERSPEECH 2019, 2019, : 664 - 668
  • [5] Chou JC, 2018, INTERSPEECH, P501
  • [6] Multifunctional Metamirrors for Broadband Focused Vector-Beam Generation
    Ding, Fei
    Chen, Yiting
    Yang, Yuanqing
    Bozhevolnyi, Sergey, I
    [J]. ADVANCED OPTICAL MATERIALS, 2019, 7 (22)
  • [7] Dumoulin V., 2017, ICLR
  • [8] Ha David, 2017, 5 INT C LEARN REPR I
  • [9] Constructing Educational Concept Maps with Multiple Relationships from Multi-source Data
    Huang, Xiaoqing
    Liu, Qi
    Wang, Chao
    Han, Haoyu
    Ma, Jianhui
    Chen, Enhong
    Su, Yu
    Wang, Shijin
    [J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1108 - 1113
  • [10] Jia Y, 2018, ADV NEUR IN, V31