Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data

被引:0
|
作者
Li, Yanping [1 ,4 ]
Lee, Kong Aik [2 ]
Yuan, Yougen [3 ]
Li, Haizhou [4 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China
[2] NEC Corp Ltd, Data Sci Res Labs, Tokyo, Japan
[3] Northwestern Polytech Univ, Xian, Shaanxi, Peoples R China
[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a novel approach to many-to-many (M2M) voice conversion for non-parallel training data. In the proposed approach, we first obtain bottleneck features (BNFs) as speaker representations from a deep neural network (DNN). Then, a variational autoencoder (VAE) implements the mapping function (i.e., a reconstruction process) using both the latent semantic information and the speaker representations. Furthermore, we propose an adaptive scheme by intervening the training process of the DNN, which can enrich the target speaker's personality feature space in the case of limited training data. Our approach has three advantages: 1) neither parallel training data nor explicit frame alignment process is required; 2) consolidates multiple pair-wise systems into a single M2M model (many-source speakers to many-target speakers); 3) expands M2M conversion task from closed set to open set when the training data of target speaker is very limited. The objective and subjective evaluations show that our proposed approach outperforms the baseline system.
引用
收藏
页码:829 / 833
页数:5
相关论文
共 50 条
  • [21] Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion
    Doi, Hironori
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Nakamura, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1066 - 1070
  • [22] ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder
    Katneoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1432 - 1443
  • [23] Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
    Ding, Shaojin
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2019, 2019, : 724 - 728
  • [24] Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics
    Liu, Yufei
    Yu, Chengzhu
    Shuai, Wang
    Yang, Zhenchuan
    Chao, Yang
    Zhang, Weibin
    INTERSPEECH 2021, 2021, : 1369 - 1373
  • [25] Multiple Non-Negative Matrix Factorization for Many-to-Many Voice Conversion
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1175 - 1184
  • [26] Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion and Training Data Generation Using a Singing-to-Singing Synthesis System
    Doi, Hironori
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Nakamura, Satoshi
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [27] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON ADAPTATION METHOD
    Song, Peng
    Zheng, Wenming
    Zhao, Li
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6905 - 6909
  • [28] VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    Seki, Shogo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2213 - 2226
  • [29] SINGING VOICE CONVERSION WITH NON-PARALLEL DATA
    Chen, Xin
    Chu, Wei
    Guo, Jinxi
    Xu, Ning
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 292 - 296
  • [30] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON FT-GMM
    Chen, Ling-Hui
    Ling, Zhen-Hua
    Dai, Li-Rong
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5116 - 5119