Voice Conversion Based on Unified Dictionary with Clustered Features Between Non-parallel Corpus

被引:1
|
作者
Jin, Hui [1 ]
Yu, Yi-Biao [1 ]
机构
[1] Soochow Univ, Sch Elect & Informat Engn, Suzhou 215000, Peoples R China
关键词
Voice conversion; Clustered features; Non-negative matrix factorization; Unified dictionary;
D O I
10.1109/ICNISC.2018.00052
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Non-negative matrix factorization (NMF) has been widely applied to exemplar-based voice conversion(VC) recently. It differs noise robustness and naturalness of the converted voice, compared with conventional statistical Gaussian mixture model-based VC. However, parallel training data from source and target speakers are required so it can not realize the arbitrary speakers' voice conversion, especially when the corpus of target speakers is inadequate. In this paper, we present a novel algorithm by clustering the spectral features in high dimensions to construct the unified dictionary and introduce a mapping matrix between source and target sparse coefficients. Experimental results demonstrate that the value of average cepstral distortion is 0.833 which is about 4.3% lower than the performance of conventional NMF based method. Subjective evaluations such as ABX and MOS are also discussed. It indicates that the speech quality in our study is quite better than conventional NMF. The target speaker's spectra are even unnecessary to be included in the training set.
引用
收藏
页码:229 / 232
页数:4
相关论文
共 50 条
  • [1] A novel method for voice conversion based on non-parallel corpus
    Sayadian A.
    Mozaffari F.
    International Journal of Speech Technology, 2017, 20 (3) : 587 - 592
  • [2] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON ADAPTATION METHOD
    Song, Peng
    Zheng, Wenming
    Zhao, Li
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6905 - 6909
  • [3] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Takashima, Yuki
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [4] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Yuki Takashima
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [5] SINGING VOICE CONVERSION WITH NON-PARALLEL DATA
    Chen, Xin
    Chu, Wei
    Guo, Jinxi
    Xu, Ning
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 292 - 296
  • [6] Non-Parallel Voice Conversion for ASR Augmentation
    Wang, Gary
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Huang, Yinghui
    Emond, Jesse
    Mengibar, Pedro Moreno
    INTERSPEECH 2022, 2022, : 3408 - 3412
  • [7] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON FT-GMM
    Chen, Ling-Hui
    Ling, Zhen-Hua
    Dai, Li-Rong
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5116 - 5119
  • [8] GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus
    Zhang, Zining
    He, Bingsheng
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 791 - 795
  • [9] CVC: Contrastive Learning for Non-parallel Voice Conversion
    Li, Tingle
    Liu, Yichen
    Hu, Chenxu
    Zhao, Hang
    INTERSPEECH 2021, 2021, : 1324 - 1328
  • [10] NOVEL METRIC LEARNING FOR NON-PARALLEL VOICE CONVERSION
    Shah, Nirmesh J.
    Patil, Hemant A.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3722 - 3726