VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL

被引:0
作者
Saito, Daisuke [1 ]
Doi, Hidenobu [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
来源
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2014年
关键词
Voice conversion; Gaussian mixture model; matrix variate distribution; matrix variate normal; matrix variate Gaussian mixture model; SPEECH RECOGNITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a novel approach to construct a mapping function between a given speaker pair using probability density functions (PDF) of matrix variate. In voice conversion studies, two important functions should be realized: 1) precise modeling of both the source and target feature spaces, and 2) construction of a proper transform function between these spaces. Voice conversion based on Gaussian mixture model (GMM) is widely used because of their flexibility and easiness in handling. In GMM-based approaches, a joint vector space of the source and target is first constructed, and the joint PDF of the two vectors is modeled as GMM in the joint vector space. The joint vector approach mainly focuses on precise modeling of the 'joint' feature space, and does not always construct a proper transform between two feature spaces. In contrast, the proposed method constructs the joint PDF as GMM in a matrix variate space whose row and column respectively correspond to the two functions, and it has potential to precisely model both the characteristics of the feature spaces and the relation between the source and target spaces. Experimental results show that the proposed method contributes to improve the performance of voice conversion.
引用
收藏
页码:567 / 571
页数:5
相关论文
共 16 条
  • [1] Abe M., 1988, ICASSP 88: 1988 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.88CH2561-9), P655, DOI 10.1109/ICASSP.1988.196671
  • [2] Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
  • [3] Deng L, 2001, INT CONF ACOUST SPEE, P301, DOI 10.1109/ICASSP.2001.940827
  • [4] The MLE algorithm for the matrix normal distribution
    Dutilleul, P
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1999, 64 (02) : 105 - 123
  • [5] Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423
  • [6] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds
    Kawahara, H
    Masuda-Katsuse, I
    de Cheveigné, A
    [J]. SPEECH COMMUNICATION, 1999, 27 (3-4) : 187 - 207
  • [7] Kominek J., 2003, CMU ARCTIC DATABASES
  • [8] ATR JAPANESE SPEECH DATABASE AS A TOOL OF SPEECH RECOGNITION AND SYNTHESIS
    KUREMATSU, A
    TAKEDA, K
    SAGISAKA, Y
    KATAGIRI, S
    KUWABARA, H
    SHIKANO, K
    [J]. SPEECH COMMUNICATION, 1990, 9 (04) : 357 - 363
  • [9] Lee CH, 2006, INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, P2254
  • [10] Nonparallel training for voice conversion based on a parameter adaptation approach
    Mouchtaris, A
    Van der Spiegel, J
    Mueller, P
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 952 - 963