Voice Conversion without Parallel Speech Corpus Based on Mixtures of Linear Transform

被引：1

作者：

Jian, Zhi-Hua ^{[1
]}

Yang, Zhen ^{[1
]}

机构：

[1] Nanjing Univ Post & Telecommun, Inst Signal Proc & Transmiss, Nanjing, Peoples R China

来源：

2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15 | 2007年

关键词：

Voice conversion; multimedia application; Ms-LT; EM algorithm;

D O I：

10.1109/WICOM.2007.701

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents an algorithm for voice conversion based on mixtures of linear transform (Ms-LT) which avoids the need for parallel training data inherent in conventional approaches. In maximum likelihood framework, the EM algorithm is used to compute the parameters of the conversion function. And the chirp z-transform is utilized to enhance the averaged spectral envelop due to the linear weighting. The proposed voice conversion system is evaluated using both objective and subjective measures. The experimental results demonstrate that our approach is capable of effectively transforming speaker identity and can achieve comparable results of the conventional methods where a parallel corpus exists.

引用

页码：2825 / 2828

页数：4

共 8 条

[1] Speaker Transformation Algorithm using Segmental Codebooks (STASC)
Arslan, LM
[J]. SPEECH COMMUNICATION, 1999, 28 (03) : 211 - 226
[2] Application of speech conversion to alaryngeal speech enhancement
Bi, N
Qi, YY
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (02): : 97 - 105
[3] Maximum-likelihood stochastic-transformation adaptation of hidden Markov models
Diakoloukas, VD
Digalakis, VV
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (02): : 177 - 187
[4] An approach to voice conversion using feature statistical mapping
Hasan, MM
Nasr, AM
Sultana, S
[J]. APPLIED ACOUSTICS, 2005, 66 (05) : 513 - 532
[5] Kain Alexander., 2001, HIGH RESOLUTION VOIC
[6] MOULINES E, 1995, SPEECH COMMUNICATION, V16
[7] TRANSFORMATION OF FORMANTS FOR VOICE CONVERSION USING ARTIFICIAL NEURAL NETWORKS
NARENDRANATH, M
MURTHY, HA
RAJENDRAN, S
YEGNANARAYANA, B
[J]. SPEECH COMMUNICATION, 1995, 16 (02) : 207 - 216
[8] Continuous probabilistic transform for voice conversion
Stylianou, Y
Cappe, O
Moulines, E
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (02): : 131 - 142

← 1 →