Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion

被引：0

作者：

Amini, Jamal ^{[1
]}

Shahrebabaki, Abdoreza Sabzi ^{[1
]}

Shokouhi, Navid ^{[1
]}

Sheikhzadeh, Hamid ^{[1
]}

Raahemifa, Kaamran ^{[2
]}

Eslami, Mehdi ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran

[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada

来源：

2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013) | 2013年

关键词：

Analysis/Synthesis; Feature Extraction; Voice Conversion; GMM; STRAIGHT; FREQUENCY;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.

引用

页码：428 / 433

页数：6

共 50 条

[1] Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
Doi, Hironori
Nakamura, Keigo
Toda, Tomoki
Saruwatari, Hiroshi
Shikano, Kiyohiro
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2472 - 2482
[2] Age Approximation from Speech using Gaussian Mixture Models
Mittal, Tanushri
Barthwal, Anurag
Koolagudi, Shashidhar G.
2013 SECOND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND SECURITY (ADCONS 2013), 2013, : 74 - 78
[3] A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis
Tuerk, Oytun
Schroeder, Marc
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2282 - 2285
[4] HMM adaptation and voice conversion for the synthesis of child speech: a comparison
Watts, Oliver
Yamagishi, Junichi
King, Simon
Berkling, Kay
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2595 - +
[5] Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion
Nose, Takashi
Kobayashi, Takao
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 578 - 581
[6] Speech Enhancement for Automatic Speech Recognition Using Complex Gaussian Mixture Priors for Noise and Speech
Astudillo, Ramon F.
Hoffmann, Eugen
Mandelartz, Philipp
Orglmeister, Reinhold
ADVANCES IN NONLINEAR SPEECH PROCESSING, 2010, 5933 : 60 - 67
[7] High-Individuality Voice Conversion Based on Concatenative Speech Synthesis
Fujii, Kei
Okawa, Jun
Suigetsu, Kaori
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 483 - 488
[8] Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques
Turk, Oytun
Schroeder, Marc
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 965 - 973
[9] Iteratively Improving Speech Recognition and Voice Conversion
Singh, Mayank Kumar
Takahashi, Naoya
Onoe, Naoyuki
INTERSPEECH 2023, 2023, : 206 - 210
[10] Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Liu, Zhonghua
Wang, Shijun
Chen, Ning
INTERSPEECH 2023, 2023, : 2298 - 2302

← 1 2 3 4 5 →