Voice Conversion Using Structrued Gaussian Mixture Model

被引：0

作者：

Zeng, Daojian ^{[1
]}

Yu, Yibiao ^{[1
]}

机构：

[1] Soochow Univ, Sch Elect & Informat Engn, Suzhou, Peoples R China

来源：

2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III | 2010年

关键词：

voice conversion; SGMM; AUS;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Gaussian Mixture Model (GMM) is commonly used in voice conversion.. However, traditional GMM based voice conversion usually extracts a conversion function from parallel corpus, which greatly limits the application of the technology. In an attempt to overcome this drawback, structured Gaussian Mixture Model (SGMM) is applied to model the speaker's acoustic feature distribution. In particular, two speakers' isolated SGMMs are aligned based on Acoustic Universal Structure (AUS) theory. Then the conversion function is extracted from two aligned SGMMs in a manner similar to conventional method. The subjective listening tests indicate that the proposed method achieves equivalent: speech quality and speaker individuality compared with conventional method.

引用

页码：541 / 544

页数：4

共 8 条

[1]

Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423

[2] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

[3]

Minematsu N, 2005, INT CONF ACOUST SPEE, P889

[4]

Mouchtaris A, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P1

[5] ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].

REYNOLDS, DA ;

ROSE, RC .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01) :72-83

[6] Continuous probabilistic transform for voice conversion [J].

Stylianou, Y ;

Cappe, O ;

Moulines, E .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (02) :131-142

[7] Statistical parametric speech synthesis [J].

Zen, Heiga ;

Tokuda, Keiichi ;

Black, Alan W. .

SPEECH COMMUNICATION, 2009, 51 (11) :1039-1064

[8]

Zhang M., 2009, P ICASSP, P4281

← 1 →